Data Analytics MCQ 15000+
Data Analytics
Unit 1:
1. Data Analysis is a process of?
A. inspecting data B. cleaning data C. transforming data D. All of the above
2. Which of the following is not a major data analysis approaches?
A. Data Mining B. Predictive Intelligence C. Business Intelligence D. Text Analytics
3. How many main statistical methodologies are used in data analysis?
A. 2 B. 3 C. 4 D. 5
4. In descriptive statistics, data from the entire population or a sample is summarized with ?
A. integer descriptors B. floating descriptors C. numerical descriptors D. decimal descriptors View Answer
5. Data Analysis is defined by the statistician?
A. William S. B. Hans Peter Luhn C. Gregory Piatetsky-Shapiro D. John Tukey
6. Which of the following is true about hypothesis testing?
A. answering yes/no questions about the data B. estimating numerical characteristics of the data C. describing associations within the data D. modeling relationships within the data
7. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.
A. TRUE B. FALSE C. Can be true or false D. Can not say
8. The branch of statistics which deals with development of particular statistical methods is classified as
A. industry statistics B. economic statistics C. applied statistics D. applied statistics
9. Which of the following is true about regression analysis?
A. answering yes/no questions about the data B. estimating numerical characteristics of the data C. modeling relationships within the data D. describing associations within the data
10. Text Analytics, also referred to as Text Mining?
A. TRUE B. FALSE C. Can be true or false D. Can not say
11. In an Internet context, this is the practice of tailoring Web pages to individual users’ characteristics or preferences. 1. Web services 2. customer-facing 3. client/server 4. personalization
12. This is the processing of data about customers and their relationship with the enterprise in order to improve the enterprise’s future sales and service and lower cost. 1. clickstream analysis 2. database marketing 3. customer relationship management 4. CRM analytics
13. This is a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions. 1. best practice 2. data mart
3. business information warehouse 4. business intelligence
14. This is a systematic approach to the gathering, consolidation, and processing of consumer data (both for customers and potential customers) that is maintained in a company’s databases 1. database marketing 2. marketing encyclopedia 3. application integration 4. service oriented integration
15. This is an arrangement in which a company outsources some or all of its customer relationship management functions to an application service provider (ASP). 1. spend management 2. supplier relationship management 3. hosted CRM 4. Customer Information Control System
16. What are the five V’s of Big Data? 1. Volume 2. velocity 3. Variety 4. All of the above
m
17. ____ hides the limitations of Java behind a powerful and concise Clojure API for Cascading.” 1. Scalding 2. Cascalog 3. Hcatalog 4. Hcalding
18. What are the main components of Big Data? 1. MapReduce 2. HDFS 3. YARN
4. All of these
19. What are the different features of Big Data Analytics? 1. Open-Source 2. Scalability 3. Data Recovery 4. All the above
20. Define the Port Numbers for NameNode, Task Tracker and Job Tracker 1. NameNode 2. Task Tracker 3. Job Tracker 4. All of the above
21. Facebook Tackles Big Data With ____ based on Hadoop 1. Project Prism 2. Prism 3. ProjectData 4. ProjectBid
22. Which of the following is not a phase of Data Analytics Life Cycle? 1. Communication 2. Recall 3. Data Preparation 4. Model Planning
UNIT 2: DATA ANALYSIS
1. In regression, the equation that describes how the response variable (y) is related to the explanatory variable (x) is: a. the correlation model b. the regression model c. used to compute the correlation coefficient d. None of these alternatives is correct.
2. The relationship between number of beers consumed (x) and blood alcohol content (y) was s tu died in 16 male college students by using least squares regression. The following regression equation was obtained from this study: != -0.0127 + 0.0180x The above equation implies that:
a. each beer consumed increases blood alcohol by 1.27%
b. on average it takes 1.8 beers to increase blood alcohol content by 1%
c. each beer consumed increases blood alcohol by an average of amount of 1.8%
d. each beer consumed increases blood alcohol by exactly 0.018
3 . SSE can never be
a. larger than SST
b. smaller than SST
c. equal to 1
d. equal to zero
4. Regression modeling is a statistical framework for developing a mathematical equation that describes how
a. one explanatory and one or more response variables are related
b. several explanatory and several response variables response are related
c. one response and one or more explanatory variables are related
d. All of these are correct.
5. In regression analysis, the variable that is being predicted is the
a. response, or dependent, variable
b. independent variable
c. intervening variable
d. is usually x
6 Regression analysis was applied to return rates of sparrowhawk colonies. Regression analysis was used to study the relationship between return rate (x: % of birds that return to the colony in a given year) and immigration rate (y: % of new adults that join the colony per year). The following regression equation
was obtained. ! = 31.9 – 0.34x Based on the above estimated regression equation, if the return rate were to decrease by 10% the rate of immigration to the colony would:
a. increase by 34%
b. increase by 3.4%
c. decrease by 0.34%
d. decrease by 3.4%
7. In least squares regression, which of the following is not a required assumption about the error term ε?
a. The expected value of the error term is one.
b. The variance of the error term is the same for all values of x.
c. The values of the error term are independent.
d. The error term is normally distributed.
8. Larger values of r 2 (R2 ) imply that the observations are more closely grouped about the
a. average value of the independent variables
b. average value of the dependent variable
c. least squares line
d. origin
9. In a regression analysis if r 2 = 1, then
a. SSE must also be equal to one
b. SSE must be equal to zero
c. SSE can be any positive value
d. SSE must be negative
10.Which type of multivariate analysis should be used when a researcher wants to reduce a Set of variables to a smaller set of composite variables by identifying underlying dimensions of the data?
A)Conjoint analysis
B)Cluster analysis
C)Multiple regression analysis
D)Factor analysis
11. Which type of multivariate analysis should be used when a researcher wants to estimate The utility that consumers associate with different product features?
A)Conjoint analysis
B)Cluster analysis\ A
C)Multiple regression analysis
D)Factor analysis
12. Which type of multivariate analysis should be used when a researcher wants to identify Subgroups of individuals that are homogeneous within subgroups and different from other subgroups?
A)Conjoint analysis
B)Cluster analysis
C)Multiple regression analysis
D)Factor analysis
13. Which type of multivariate analysis should be used when a researcher wants predict Group membership on the basis of two or more independent variables?
A)Conjoint analysis
B)Cluster analysis
C)Multiple regression analysis
D)Multiple discriminant analysis
14. Support vector machine (SVM) is a _________ classifier? Discriminative
Generative
15. SVM can be used to solve ___________ problems. Classification
Regression
Clustering
Both Classification and Regression
16. SVM is a ___________ learning algorithm Supervised
Unsupervised
17. SVM is termed as ________ classifier Minimum margin
Maximum margin
18. The training examples closest to the separating hyperplane are called as _______ Training vectors
Test vectors
19. A factor analysis is…, while a principal components analysis is…
A broad term, the most commonly used technique for doing factor analysis.
B The most commonly used technique for doing factor analysis, a broad term.
C Both of the above.
D NONE OF THE ABOVE
20. Dimension Reduction is defined as-
• A It is a process of converting a data set having vast dimensions into a data set with lesser dimensions. • B It ensures that the converted data set conveys similar information concisely. C ALL OF ABOVE
D NONE OF THE ABOVE
21.. What is the form of Fuzzy logic? a) Two-valued logic b) Crisp set logic c) Many-valued logic d) Binary set logic
22. Traditional set theory is also known as Crisp Set theory. a) True b) False
23. The truth values of traditional set theory is ____________ and that of fuzzy set is __________ a) Either 0 or 1, between 0 & 1 b) Between 0 & 1, either 0 or 1 c) Between 0 & 1, between 0 & 1 d) Either 0 or 1, either 0 or 1
24. Fuzzy logic is extension of Crisp set with an extension of handling the concept of Partial Truth. a) True b) False
25. The room temperature is hot. Here the hot (use of linguistic variable is used) can be represented by _______ a) Fuzzy Set b) Crisp Set c) Fuzzy & Crisp Set
d) None of the mentioned
26. The values of the set membership is represented by ___________ a) Discrete Set b) Degree of truth c) Probabilities d) Both Degree of truth & Probabilities
27. Japanese were the first to utilize fuzzy logic practically on high-speed trains in Sendai. a) True b) False
28. Fuzzy Set theory defines fuzzy operators. Choose the fuzzy operators from the following. a) AND b) OR c) NOT d) All of the mentioned
29. There are also other operators, more linguistic in nature, called __________ that can be applied to fuzzy set theory. a) Hedges b) Lingual Variable c) Fuzz Variable d) None of the mentioned
30. Fuzzy logic is usually represented as ___________ a) IF-THEN-ELSE rules b) IF-THEN rules c) Both IF-THEN-ELSE rules & IF-THEN rules d) None of the mentioned
31. Like relational databases there does exists fuzzy relational databases. a) True b) False
32. ______________ is/are the way/s to represent uncertainty. a) Fuzzy Logic b) Probability c) Entropy d) All of the mentioned
33. ____________ are algorithms that learn from their more complex environments (hence eco) to generalize, approximate and simplify solution logic. a) Fuzzy Relational DB b) Ecorithms c) Fuzzy Set d) None of the mentioned
Unit 3:
1 : What do you mean by sampling of stream data?
1. Sampling reduces the amount of data fed to a subsequent data mining algorithm. 2. Sampling reduces the diversity of the data stream 3. Sampling aims to keep statistical properties of the data intact. 4. Sampling algorithms often doesn't need multiple passes over the data
Question 2 : if Distance measure d(x, y)= d(y, x) then it is called
1. Symmetric 2. identical 3. positiveness 4. triangle inequality
Question 3 : NOSQL is
1. Not only SQL 2. Not SQL 3. Not Over SQL 4. No SQL
Question 4 : Find the L1 and L2 distances between the points (5, 6, 7) and (8, 2, 4).
1. L1 =10 , L2 = 5.83 2. L1 =10 , L2 = 5 3. L1 =11 , L2 = 4.9
4. L1 =9 , L2 = 5.83
Question 5 : The time between elements of one stream
1. need not be uniform 2. need to be uniform 3. must be 1ms. 4. must be 1ns
Question 6 : A Reduce task receives
1. one or more keys and their associated value list 2. key value pair 3. list of keys and their associated values 4. list of key value pairs
Question 7 : Which of the following statements about data streaming is true?
1. Stream data is always unstructured data. 2. Stream data often has a high velocity. 3. Stream elements cannot be stored on disk. 4. Stream data is always structured data.
Question 8 : Hadoop is the solution for:
1. Database software 2. Big Data Software 3. Data Mining software 4. Distribution software
Question 9 : ETL stands for ________________
1. Extraction transformation and loading 2. Extract Taken Lend 3. Enterprise Transfer Load 4. Entertainment Transference Load
Question 10 : “Sharding” a database across many server instances can be achieved with _______________
1. MAN 2. LAN 3. WAN 4. SAN
Question 11 : Neo4j is an example of which of the following NoSQL architectural pattern?
1. Key-value store 2. Graph Store 3. Document Store 4. Column-based Store
Question 12 : CSV and JSON can be described as
1. Structured data 2. Unstructured data 3. Semi-structured data 4. Multi-structured data
Question 13 : The hardware term used to describe Hadoop hardware requirements is
1. Commodity firmware
2. Commodity software 3. Commodity hardware 4. Cluster hardware
Question 14 : Which of the following is not a Hadoop Distributions?
1. MAPR 2. Cloudera 3. Hortonworks 4. RMAP
Question 15 : Which of the following Operation can be implemented with Combiners?
1. Selection 2. Projection 3. Natural Join 4. Union
Question 16 : ________ stores are used to store information about networks, such as social connections.
1. Key-value 2. Wide-column 3. Document 4. graph
Question 17 : The DGIM algorithm was developed to estimate the counts of 1's occur within the last k bits of a stream window N. Which of the following statements is true about the estimate of the number of 0's based on DGIM?
1. The number of 0's cannot be estimated at all. 2. The number of 0's can be estimated with a maximum guaranteed error
3. To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's. 4. Determine whether an element has already occurred in previous stream data.
Question 18 : If size of file is 4 GB and block size is 64 MB then number of mappers required for MapReduce task is
1. 8 2. 16 3. 32 4. 64
Question 19 : Which of the following is not the default daemon of Hadoop?
1. Namenode 2. Datanode 3. Job Tracker 4. Job history server
Question 20 : In Bloom filter an array of n bits is initialized with
1. all 0s 2. all 1s 3. half 0s and half 1s 4. all -1
Question 21 : _____________is a batch-based, distributed computing framework modeled after Google’s paper.
1. MapCompute 2. MapReuse 3. MapCluster
4. MapReduce
Question 22 : What is the edit distance between A=father and B=feather ?
1. 5 2. 1 3. 4 4. 2
Question 23 : Sliding window operations typically fall in the category
1. OLTP Transactions 2. Big Data Batch Processing 3. Big Data Real Time Processing 4. Small Batch Processing
Question 24 : _________ systems focus on the relationship between users and items for recommendation.
1. DGIM 2. Collaborative-Filtering 3. Content Based and Collaborative Filtering 4. Content Based
Question 25 : Find Hamming Distance for vectors A=100101011 B=100010010
1. 2 2. 4 3. 3 4. 1
Question 26 : During start up, the ___________ loads the file system state from the fsimage and the edits log file.
1. Datanode 2. Namenode 3. Secondary Namenode 4. Rack awereness policy
Question 27 : What is the finally produced by Hierarchical Agglomerative Clustering?
1. final estimate of cluster centroids 2. assignment of each point to clusters 3. tree showing how close things are to each other 4. Group of clusters
Question 28 : The Jaccard similarity of two non-binary sets A and B, is defined by__________
1. Jaccard Index 2. Primary Index 3. Secondary Index 4. Clustered Index
Question 29 : Following is based on grid like street geography of the New York:
1. Manhattan Distance 2. Edit Distance 3. Hamming distance 4. Lp distance
Question 30 : The FM-sketch algorithm can be used to:
1. Estimate the number of distinct elements. 2. Sample data with a time-sensitive window. 3. Estimate the frequent elements. 4. Determine whether an element has already occurred in previous stream data.
31 : Pick a hash function h that maps each of the N elements to at least log2 N bits, Estimated number of distinct elements is
1. 2^R 2. 2^(-R) 3. 1-(2^R) 4. 1-(2^(-R))
Question 32 : which of the following is not the characterstic of stream data?
1. Continuous 2. ordered 3. persistant 4. huge
Question 33 : Which of the following is a column-oriented database that runs on top of HDFS
1. Hive 2. Sqoop 3. Hbase 4. Flume
Question 34 : Which of the following decides the number of partitions that are created on the local file system of the worker nodes?
1. Number of map tasks 2. Number of reduce tasks 3. Number of file input splits 4. Number of distinct keys in the intermediate key-value pairs
Question 35 : Which of the following is not the class of points in BFR algorithm
1. Discard Set (DS) 2. Compression Set (CS) 3. Isolation Set (IS) 4. Retained Set (RS)
Question 36 : Which of the following is not true for 5v?
1. Volume 2. variable 3. Velocity 4. value
Question 37 : Which algorithm isused to find fully connected subgraph in soial media mining?
1. CURE 2. CPM 3. SimRank 4. Girvan-Newman Algorithm
Question 38 : A ________________ query Q is a query that is issued once over a database D, and then logically runs continuously over the data in D until Q is terminated.
1. One-time Query 2. Standing Query 3. Adhoc Query
4. General Query
Question 39 : Effect of Spider trap on page rank
1. perticular page get the highest page rank 2. All the pages of web will get 0 page rank 3. no effect on any page 4. affects a perticular set of pages
Question 40 : Which of the following is correct option for MongoDB
1. MongoDB is column oriented data store 2. MongoDB uses XML more in comparison with JSON 3. MongoDB is a document store database 4. MongoDB is a key-value data store
Question 41 : _________ systems focus on the relationship between users and items for recommendation.
1. DGIM 2. Collaborative-Filtering 3. Content Based and Collaborative Filtering 4. Content Based
Question 42 : The graphical representation of an SNA is made up of links and _____________.
1. People 2. Networks 3. Nodes 4. Computers
Question 43 : Hadoop is a framework that works with a variety of related tools. Common hadoop ecosystem include ____________
1. MapReduce, Hummer and Iguana 2. MapReduce, Hive and HBase 3. MapReduce, MySQL and Google Apps 4. MapReduce, Heron and Trumpet
Question 44 : About data streaming, Which of the following statements is true?
1. Stream data is always unstructured data. 2. Stream data often has a high velocity. 3. Stream elements cannot be stored on disk. 4. Stream data is always structured data.
Question 45 : Which of the following is a NoSQL Database Type ?
1. SQL 2. JSON 3. Document databases 4. CSV
Question 46 : Techniques for fooling search engines into believing your page is about something it is not, are called _____________.
1. term spam 2. page rank 3. phishing 4. dead ends
Question 47 : The police set up checkpoints at randomly selected road locations, then inspected every driver at those locations. What type of sample is this?
1. Simple Random Sample 2. Startified Random Sample 3. Cluster Random Sample 4. Uniform sampling
Question 48 : Which of the following statements about standard Bloom filters is correct?
1. It is possible to delete an element from a Bloom filter. 2. A Bloom filter always returns the correct result. 3. It is possible to alter the hash functions of a full Bloom filter to create more space. 4. A Bloom filter always returns TRUE when testing for a previously added element.
Question 49 : Which of the following is responsible for managing the cluster resources and use them for scheduling users’ applications?
1. Hadoop Common 2. YARN 3. HDFS 4. MapReduce
Question 50 : ___________is related with an inconsistency possessed by data and this in turn hampers the data analization process or creates hurdle in the way for those wish to analyze this form of data.
1. Variability 2. Variety 3. Volume 4. Complexity
Unit 4:
Question 1 This clustering algorithm terminates when mean values computed for the current iteration of the algorithm are identical to the computed mean values for the previous iteration Select one:
a. K-Means clustering
b. conceptual clustering
c. expectation maximization
d. agglomerative clustering Show Answer
Question 2 This clustering approach initially assumes that each data instance represents a single cluster. Select one:
a. expectation maximization
b. K-Means clustering
c. agglomerative clustering
d. conceptual clustering Show Answer
Question 3 The correlation coefficient for two real-valued attributes is – 0.85. What does this value tell you? Select one:
a. The attributes are not linearly related.
b. As the value of one attribute decreases the value of the second attribute increases.
c. As the value of one attribute increases the value of the second attribute also increases.
d. The attributes show a linear relationship Show Answer
Question 4 Time Complexity of k-means is given by Select one:
a. O(mn)
b. O(tkn)
c. O(kn)
d. O(t2kn) Show Answer
Question 5 Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that Select one:
a. Y is false when X is known to be false.
b. Y is true when X is known to be true.
c. X is true when Y is known to be true
d. X is false when Y is known to be false.
Question 6 Chameleon is Select one:
a. Density based clustering algorithm
b. Partitioning based algorithm
c. Model based algorithm
d. Hierarchical clustering algorithm
Question 7 In _________ clusterings, points may belong to multiple clusters Select one:
a. Non exclusivce
b. Partial
c. Fuzzy
d. Exclusive Show Answer
Question 8 Find odd man out Select one:
a. DBSCAN
b. K mean
c. PAM
d. K medoid
Question 9 Which statement is true about the K-Means algorithm? Select one:
a. The output attribute must be cateogrical.
b. All attribute values must be categorical.
c. All attributes must be numeric
d. Attribute values may be either categorical or numeric
Question 10 This data transformation technique works well when minimum and maximum values for a real-valued attribute are known. Select one:
a. z-score normalization
b. min-max normalization
c. logarithmic normalization
d. decimal scaling
Question 11 The number of iterations in apriori ___________ Select one:
a. increases with the size of the data
b. decreases with the increase in size of the data
c. increases with the size of the maximum frequent set
d. decreases with increase in size of the maximum frequent set Show Answer
Question 12 Which of the following are interestingness measures for association rules? Select one:
a. recall
b. lift
c. accuracy
d. compactness Show Answer
Question 13 Which one of the following is not a major strength of the neural network approach? Select one
: a. Neural network learning algorithms are guaranteed to converge to an optimal solution
b. Neural networks work well with datasets containing noisy data.
c. Neural networks can be used for both supervised learning and unsupervised clustering
d. Neural networks can be used for applications that require a time element to be included in the data Show Answer
Question 14 Find odd man out Select one:
a. K medoid
b. K mean
c. DBSCAN
d. PAM
Question 15 Given a frequent itemset L, If |L| = k, then there are Select one:
a. 2k – 1 candidate association rules
b. 2k candidate association rules
c. 2k – 2 candidate association rules
d. 2k -2 candidate association rules Show Answer
Question 16 . _________ is an example for case based-learning Select one:
a. Decision trees
b. Neural networks
c. Genetic algorithm
d. K-nearest neighbor Show Answer
Question 17 The average positive difference between computed and desired outcome values. Select one:
a. mean positive error
b. mean squared error
c. mean absolute error
d. root mean squared error Show Answer
Question 18 Frequent item sets is Select one:
a. Superset of only closed frequent item sets
b. Superset of only maximal frequent item sets
c. Subset of maximal frequent item sets
d. Superset of both closed frequent item sets and maximal frequent item sets Show Answer
Question 19 1. Assume that we have a dataset containing information about 200 individuals. A supervised data mining session has discovered the following rule: IF age < 30 & credit card insurance = yes THEN life insurance = yes Rule Accuracy: 70% and Rule Coverage: 63% How many individuals in the class life insurance= no have credit card insurance and are less than 30 years old? Select one:
a. 63
b. 30
c. 38
d. 70 Show Answer
Question 20 Use the three-class confusion matrix below to answer percent of the instances were correctly classified? Computed Decision Class 1 Class 2 Class 3 Class 1 10 5 3 Class 2 5 15 3 Class 3 2 2 5 Select one:
a. 60
b. 40
c. 50
d. 30 Show Answer
Question 21 Which of the following is cluster analysis? Select one:
a. Simple segmentation
b. Grouping similar objects
c. Labeled classification
d. Query results grouping Show Answer
Question 22 A good clustering method will produce high quality clusters with Select one:
a. high inter class similarity
b. low intra class similarity
c. high intra class similarity
d. no inter class similarity Show Answer
Question 23 Which two parameters are needed for DBSCAN Select one:
a. Min threshold
b. Min points and eps
c. Min sup and min confidence
d. Number of centroids Show Answer
Question 24 Which statement is true about neural network and linear regression models? Select one:
a. Both techniques build models whose output is determined by a linear sum of weighted input attribute values.
b. The output of both models is a categorical attribute value.
c. Both models require numeric attributes to range between 0 and 1.
d. Both models require input attributes to be numeric. Show Answer
Question 25 In Apriori algorithm, if 1 item-sets are 100, then the number of candidate 2 item-sets are Select one:
a. 100
b. 4950
c. 200
d. 5000
Show Answer
Question 26 Significant Bottleneck in the Apriori algorithm is Select one:
a. Finding frequent itemsets
b. Pruning
c. Candidate generation
d. Number of iterations Show Answer
Question 27 The concept of core, border and noise points fall into this category? Select one:
a. DENCLUE
b. Subspace clustering
c. Grid based
d. DBSCAN Show Answer
Question 28 The correlation coefficient for two real-valued attributes is –0.85. What does this value tell you? Select one:
a. The attributes show a linear relationship
b. The attributes are not linearly related.
c. As the value of one attribute increases the value of the second attribute also increases.
d. As the value of one attribute decreases the value of the second attribute increases. Show Answer
Question 29 Machine learning techniques differ from statistical techniques in that machine learning methods Select one:
a. are better able to deal with missing and noisy data
b. typically assume an underlying distribution for the data
c. have trouble with large-sized datasets
d. are not able to explain their behavior. Show Answer
Question 30 The probability of a hypothesis before the presentation of evidence. Select one:
a. a priori
b. posterior
c. conditional
d. subjective Show Answer
Question 31 KDD represents extraction of Select one:
a. data
b. knowledge
c. rules
d. model Show Answer
Question 32 Which statement about outliers is true? Select one
: a. Outliers should be part of the training dataset but should not be present in the test data.
b. Outliers should be identified and removed from a dataset
. c. The nature of the problem determines how outliers are used
d. Outliers should be part of the test dataset but should not be present in the training data. Show Answer
Question 33 The most general form of distance is Select one:
a. Manhattan
b. Eucledian
c. Mean
d. Minkowski Show Answer
Question 34 Arbitrary shaped clusters can be found by using Select one:
a. Density methods
b. Partitional methods
c. Hierarchical methods
d. Agglomerative Show Answer
Question 35 Which Association Rule would you prefer Select one
: a. High support and medium confidence
b. High support and low confidence
c. Low support and high confidence
d. Low support and low confidence Show Answer
Question 36 With Bayes theorem the probability of hypothesis H¾ specified by P(H) ¾ is referred to as Select one:
a. a conditional probability
b. an a priori probability
c. a bidirectional probability
d. a posterior probability Show Answer
Question 37 In a Rule based classifier, If there is a rule for each combination of attribute values, what do you called that rule set R Select one:
a. Exhaustive
b. Inclusive
c. Comprehensive
d. Mutually exclusive Show Answer
Question 38 The apriori property means Select one
: a. If a set cannot pass a test, its supersets will also fail the same test
b. To decrease the efficiency, do level-wise generation of frequent item sets
c. To improve the efficiency, do level-wise generation of frequent item sets
d. If a set can pass a test, its supersets will fail the same test Show Answer
Question 39 If an item set ‘XYZ’ is a frequent item set, then all subsets of that frequent item set are Select one:
a. Undefined
b. Not frequent
c. Frequent
d. Can not say Show Answer
Question 40 Clustering is ___________ and is example of ____________learning Select one:
a. Predictive and supervised
b. Predictive and unsupervised
c. Descriptive and supervised
d. Descriptive and unsupervised Show Answer
Question 41 The probability that a person owns a sports car given that they subscribe to automotive magazine is 40%. We also know that 3% of the adult population subscribes to automotive magazine. The probability of a person owning a sports car given that they don̢۪t subscribe to automotive magazine is 30%. Use this information to compute the probability that a person subscribes to automotive magazine given that they own a sports car Select one
: a. 0.0368
b. 0.0396
c. 0.0389
d. 0.0398 Show Answer
Question 42 Simple regression assumes a __________ relationship between the input attribute and output attribute. Select one:
a. quadratic
b. inverse
c. linear
d. reciprocal Show Answer
Question 43 Which of the following algorithm comes under the classification Select one:
a. Apriori
b. Brute force
c. DBSCAN
d. K-nearest neighbor Show Answer
Question 44 Hierarchical agglomerative clustering is typically visualized as? Select one:
a. Dendrogram
b. Binary trees
c. Block diagram
d. Graph Show Answer Question
45 The _______ step eliminates the extensions of (k-1)-itemsets which are not found to be frequent,from being considered for counting support Select one:
a. Partitioning
b. Candidate generation
c. Itemset eliminations
d. Pruning Show Answer
Question 46 To determine association rules from frequent item sets Select one:
a. Only minimum confidence needed
b. Neither support not confidence needed
c. Both minimum support and confidence are needed
d. Minimum support is needed Show Answer
Question 47 What is the final resultant cluster size in Divisive algorithm, which is one of the hierarchical clustering approaches? Select one:
a. Zero
b. Three
c. singleton
d. Two Show Answer
Question 48 If {A,B,C,D} is a frequent itemset, candidate rules which is not possible is Select one:
a. C –> A
b. D –>ABCD
c. A –> BC
d. B –> ADC Show Answer
Question 49 Which Association Rule would you prefer Select one:
a. High support and low confidence
b. Low support and high confidence
c. Low support and low confidence
d. High support and medium confidence Show Answer
Question 50 The probability that a person owns a sports car given that they subscribe to automotive magazine is 40%. We also know that 3% of the adult population subscribes to automotive magazine. The probability of a person owning a sports car given that they don’t subscribe to automotive magazine is 30%. Use this information to compute the probability that a person subscribes to automotive magazine given that they own a sports car
Select one:
a. 0.0398
b. 0.0389
c. 0.0368
d. 0.0396 Show Answer
Unit 5:
1. What is true about Data Visualization?
A. Data Visualization is used to communicate information clearly and efficiently to users by the usage of information graphics such as tables and charts. B. Data Visualization helps users in analyzing a large amount of data in a simpler way. C. Data Visualization makes complex data more accessible, understandable, and usable. D. All of the above
2. Data can be visualized using?
A. graphs B. charts C. maps D. All of the above
3. Data visualization is also an element of the broader _____________.
A. deliver presentation architecture B. data presentation architecture C. dataset presentation architecture D. data process architecture
4. Which method shows hierarchical data in a nested format?
A. Treemaps B. Scatter plots C. Population pyramids D. Area charts
5. Which is used to inference for 1 proportion using normal approx?
A. fisher.test() B. chisq.test() C. Lm.test() D. prop.test()
6. Which is used to find the factor congruence coefficients?
A. factor.mosaicplot B. factor.xyplot C. factor.congruence D. factor.cumsum
7. Which of the following is tool for checking normality?
A. qqline() B. qline() C. anova() D. lm()
8. Which of the following is false?
A. data visualization include the ability to absorb information quickly B. Data visualization is another form of visual art C. Data visualization decrease the insights and take solwer decisions D. None Of the above
9. Common use cases for data visualization include?
A. Politics B. Sales and marketing C. Healthcare D. All of the above
10. Which of the following plots are often used for checking randomness in time series?
A. Autocausation B. Autorank C. Autocorrelation D. None of the above
11. Which are pros of data visualization?
A. It can be accessed quickly by a wider audience. B. It can misrepresent information C. It can be distracting D. None Of the above
12. Which are cons of data visualization?
A. It conveys a lot of information in a small space. B. It makes your report more visually appealing.
C. visual data is distorted or excessively used. D. None Of the above
13. Which of the intricate techniques is not used for data visualization?
A. Bullet Graphs B. Bubble Clouds C. Fever Maps D. Heat Maps
14. Which one of the following is most basic and commonly used techniques?
A. Line charts B. Scatter plots C. Population pyramids D. Area charts
15. Which is used to query and edit graphical settings?
A. anova() B. par() C. plot() D. cum()
16. Which of the following method make vector of repeated values?
A. rep() B. data() C. view() D. read()
17. Who calls the lower level functions lm.fit?
A. lm() B. col.max
C. par D. histo
18. Which of the following lists names of variables in a data.frame?
A. par() B. names() C. barchart() D. quantile()
19. Which of the folllowing statement is true?
A. Scientific visualization, sometimes referred to in shorthand as SciVis B. Healthcare professionals frequently use choropleth maps to visualize important health data. C. Candlestick charts are used as trading tools and help finance professionals analyze price movements over time D. All of the above
20. ________is used for density plots?
A. par B. lm C. kde D. C
Answer key:
Unit :1
1
Ans : D
Explanation: Data Analysis is a process of inspecting, cleaning, transforming and modelling data with the goal of discovering useful information, suggesting conclusions and supporting decision-making.
2. Ans : B
Explanation: Predictive Analytics is major data analysis approaches not Predictive Intelligence.
3. Ans : A
Explanation: In data analysis, two main statistical methodologies are used Descriptive statistics and Inferential statistics.
4. Ans : C
Explanation: In descriptive statistics, data from the entire population or a sample is summarized with numerical descriptors.
5. Ans : D
Explanation: Data Analysis is defined by the statistician John Tukey in 1961 as "Procedures for analyzing data.
6. Ans : A
Explanation: answering yes/no questions about the data (hypothesis testing)
7. Ans : A
Explanation: The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.
8. Ans : D
Explanation: The branch of statistics which deals with development of particular statistical methods is classified as applied statistics.
9.
Ans : C
Explanation: modeling relationships within the data (E.g. regression analysis).
10 Ans : A
Explanation: Text Data Mining is the process of deriving high-quality information from text.
11 personalization
12.
CRM analytics
13.
Advertisement
business intelligence
14. database marketing
15. hosted CRM
16. All of the above
17. Cascalog
18. All of these
19. All the above
20. All of the above
21.
Project Prism
22.
Recall
UNIT 2:
1. b
2. c
3. A
4. c
5. a
6. b
7. a
8. c
9. B
10. D
11. A
12. B
13. D
14.A
15. D
16. A
17. B
18. C
19. A broad term, the most commonly used technique for doing factor analysis.
20. C
21. Answer: c Explanation: With fuzzy logic set membership is defined by certain value. Hence it could have many values to be in the set.
22. Answer: a Explanation: Traditional set theory set membership is fixed or exact either the member is in the set or not. There is only two crisp values true or false. In case of fuzzy logic there are many values. With weight say x the member is in the set. 23. Answer: a Explanation: Refer the definition of Fuzzy set and Crisp set. 24. Answer: a Explanation: None. 25. Answer: a Explanation: Fuzzy logic deals with linguistic variables. 26. Answer: b Explanation: Both Probabilities and degree of truth ranges between 0 – 1. 27. Answer: a Explanation: None. 28. Answer: d Explanation: The AND, OR, and NOT operators of Boolean logic exist in fuzzy logic, usually defined as the minimum, maximum, and complement; 29. Answer: a Explanation: None.
30. Answer: b Explanation: Fuzzy set theory defines fuzzy operators on fuzzy sets. The problem in applying this is that the appropriate fuzzy operator may not be known. For this reason, fuzzy logic usually uses IF-THEN rules, or constructs that are equivalent, such as fuzzy associative matrices. Rules are usually expressed in the form: IF variable IS property THEN action 31. Answer: a Explanation: Once fuzzy relations are defined, it is possible to develop fuzzy relational databases. The first fuzzy relational database, FRDB, appeared in Maria Zemankova dissertation. 32. Answer: d Explanation: Entropy is amount of uncertainty involved in data. Represented by H(data). 33. Answer: c Explanation: Local structure is usually associated with linear rather than exponential growth in complexity.
Unit 4:
1. Feedback: K-Means clustering 2. Feedback: 3. As the value of one attribute decreases the value of the second attribute increases. 4. O(tkn) 5. Y is true when X is known to be true 6. Hierarchical clustering algorithm
7. Fuzzy 8. dbscan 9. All attributes must be numeric 10. : min-max normalization 11. increases with the size of the maximum frequent set 12. : lift 13. Neural network learning algorithms are guaranteed to converge to an optimal solution 14. DBSCAN 15. 2k -2 candidate association rules 16. : K-nearest neighbor 17. mean absolute error 18. Superset of both closed frequent item sets and maximal frequent item sets 19. 38 20. 60 21. Grouping similar objects 22. high intra class similarity 23. Min points and eps 24. Both models require input attributes to be numeric. 25. 4950 26. Candidate generation 27. DBSCAN 28. As the value of one attribute decreases the value of the second attribute increases. 29. are better able to deal with missing and noisy data 30. a priori 31. knowledge 32. The nature of the problem determines how outliers are used 33. Minkowski 34. Density methods 35. Low support and high confidence 36. an a priori probability 37. Exhaustive 38. If a set cannot pass a test, its supersets will also fail the same test 39. Frequent 40. Descriptive and unsupervised 41. 0.0396 42. linear 43. K-nearest neighbor 44. Dendrogram 45. Pruning 46. Both minimum support and confidence are needed 47. singleton 48. D –>ABCD 49. Low support and high confidence 50. Answer Feedback: 0.0396
Unit 5:
1. Ans : D
Explanation: Data Visualization is used to communicate information clearly and efficiently to users by the usage of information graphics such as tables and charts. It helps users in analyzing a large amount of data in a simpler way. It makes complex data more accessible, understandable, and usable.
2.
Ans : D
Explanation: Data visualization is a graphical representation of quantitative information and data by using visual elements like graphs, charts, and maps.
3. Ans : B
Explanation: Data visualization is also an element of the broader data presentation architecture (DPA) discipline, which aims to identify, locate, manipulate, format and deliver data in the most efficient way possible.
4.
Ans : A
Explanation: Treemaps are best used when multiple categories are present, and the goal is to compare different parts of a whole.
5
Ans : D
Explanation: prop.test() is used to inference for 1 proportion using normal approx.
6. Ans : C
Explanation: factor.congruence is used to find the factor congruence coefficients.
7. Ans : A
Explanation: qqnorm is another tool for checking normality.
8. Ans : C
Explanation: Data visualization decrease the insights andtake solwer decisions is false statement.
9. Ans : D
Explanation: All option are Common use cases for data visualization.
10. Ans : C
Explanation: If the time series is random, such autocorrelations should be near zero for any and all timelag separations.
11. Ans : A
Explanation: Pros of data visualization : it can be accessed quickly by a wider audience.
12.
Ans : C
Explanation: It can be distracting : if the visual data is distorted or excessively used.
13. Ans : C
Explanation: Fever Maps is not is not used for data visualization instead of that Fever charts is used.
14. Ans : A
Explanation: Line charts. This is one of the most basic and common techniques used. Line charts display how variables can change over time.
15. Ans : B
Explanation: par() is used to query and edit graphical settings.
16 Ans : B
Explanation: data() load (often into a data.frame) built-in dataset.
17. Ans : A
Explanation: lm calls the lower level functions lm.fit.
18.
Ans : D
Explanation: names function is used to associate name with the value in the vector.
19.
Ans : D
Explanation: All option are correct.
20. Ans : C
Explanation: kde is used for density plots.
MCQ for UNIT 5
1. Point out the correct statement. a) Hadoop is an ideal environment for extracting and transforming small volumes of data b) Hadoop stores data in HDFS and supports data compression/decompression c) The Giraph framework is less useful than a MapReduce job to solve graph and machine learning d) None of the mentioned
2. Which of the following genres does Hadoop produce? a) Distributed file system b) JAX-RS c) Java Message Service d) Relational Database Management System
3. Which of the following platforms does Hadoop run on? a) Bare metal b) Debian c) Cross-platform d) Unix-like
4. Hadoop achieves reliability by replicating the data across multiple hosts and hence does not require ________ storage on hosts. a) RAID b) Standard RAID levels c) ZFS d) Operating system
5. The Hadoop list includes the HBase database, the Apache Mahout ________ system, and matrix operations. a) Machine learning b) Pattern recognition c) Statistical classification d) Artificial intelligence
6. As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including _______________ a) Improved data storage and information retrieval b) Improved extract, transform and load features for data integration c) Improved data warehousing functionality d) Improved security, workload management, and SQL support
7. Point out the correct statement. a) Hadoop do need specialized hardware to process the data b) Hadoop 2.0 allows live stream processing of real-time data c) In the Hadoop programming framework output files are divided into lines or records d) None of the mentioned
8. According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop? a) Big data management and data mining b) Data warehousing and business intelligence c) Management of Hadoop clusters d) Collecting and storing unstructured data
9. Hadoop is a framework that works with a variety of related tools. Common cohorts include ____________ a) MapReduce, Hive and HBase b) MapReduce, MySQL and Google Apps c) MapReduce, Hummer and Iguana d) MapReduce, Heron and Trumpet
10. Point out the wrong statement. a) Hardtop processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data b) Hadoop uses a programming model called “MapReduce”, all the programs should conform to this model in order to work on the Hadoop platform c) The programming model, MapReduce, used by Hadoop is difficult to write and test d) All of the mentioned
11. What was Hadoop named after? a) Creator Doug Cutting’s favorite circus act b) Cutting’s high school rock band c) The toy elephant of Cutting’s son d) A sound Cutting’s laptop made during Hadoop development
12. All of the following accurately describe Hadoop, EXCEPT ____________ a) Open-source b) Real-time c) Java-based d) Distributed computing approach
13. __________ can best be described as a programming model used to develop Hadoopbased applications that can process massive amounts of data. a) MapReduce b) Mahout
c) Oozie d) All of the mentioned
14. __________ has the world’s largest Hadoop cluster. a) Apple b) Datamatics c) Facebook d) None of the mentioned
15. Facebook Tackles Big Data With _______ based on Hadoop. a) ‘Project Prism’ b) ‘Prism’ c) ‘Project Big’ d) ‘Project Data’
16. ________ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets. a) Pig Latin b) Oozie c) Pig d) Hive
17. Point out the correct statement. a) Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data b) Hive is a relational database with SQL support c) Pig is a relational database with SQL support d) All of the mentioned
18. Hive also support custom extensions written in ____________ a) C# b) Java c) C d) C++
19. Point out the wrong statement. a) Elastic MapReduce (EMR) is Facebook’s packaged Hadoop offering b) Amazon Web Service Elastic MapReduce (EMR) is Amazon’s packaged Hadoop offering c) Scalding is a Scala API on top of Cascading that removes most Java boilerplate d) All of the mentioned
20. ___________ is general-purpose computing model and runtime system for distributed data analytics. a) Mapreduce b) Drill
c) Oozie d) None of the mentioned
21. The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to ____________ a) SQL b) JSON c) XML d) All of the mentioned
22. _______ jobs are optimized for scalability but not latency. a) Mapreduce b) Drill c) Oozie d) Hive
23. A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker. a) MapReduce b) Mapper c) TaskTracker d) JobTracker
24. Point out the correct statement. a) MapReduce tries to place the data and the compute as close as possible b) Map Task in MapReduce is performed using the Mapper() function c) Reduce Task in MapReduce is performed using the Map() function d) All of the mentioned
25. ___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results. a) Maptask b) Mapper c) Task execution d) All of the mentioned
26. _________ function is responsible for consolidating the results produced by each of the Map() functions/tasks. a) Reduce b) Map c) Reducer d) All of the mentioned
27. ________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer.
a) Hadoop Strdata b) Hadoop Streaming c) Hadoop Stream d) None of the mentioned
28. __________ maps input key/value pairs to a set of intermediate key/value pairs. a) Mapper b) Reducer c) Both Mapper and Reducer d) None of the mentioned
29. The number of maps is usually driven by the total size of ____________ a) inputs b) outputs c) tasks d) None of the mentioned
30. Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster. a) MapReduce b) Map c) Reducer d) All of the mentioned
31. A ________ serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication
32. Point out the correct statement. a) DataNode is the slave/worker node and holds the user data in the form of Data Blocks b) Each incoming file is broken into 32 MB by default c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance d) None of the mentioned
33. HDFS works in a __________ fashion. a) master-worker b) master-slave c) worker/slave d) all of the mentioned
34. Point out the wrong statement. a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode c) User data is stored on the local file system of DataNodes d) DataNode is aware of the files to which the blocks stored on it belong to
35. Which of the following scenario may not be a good fit for HDFS? a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file b) HDFS is suitable for storing data related to applications requiring low latency data access c) HDFS is suitable for storing data related to applications requiring low latency data access d) None of the mentioned
36. The need for data replication can arise in various scenarios like ____________ a) Replication Factor is changed b) DataNode goes down c) Data Blocks get corrupted d) All of the mentioned
37. ________ is the slave/worker node and holds the user data in the form of Data Blocks. a) DataNode b) NameNode c) Data block d) Replication
38. HDFS provides a command line interface called __________ used to interact with HDFS. a) “HDFS Shell” b) “FS Shell” c) “DFS Shell” d) None of the mentioned
39. HDFS is implemented in _____________ programming language. a) C++ b) Java c) Scala d) None of the mentioned
40. For YARN, the ___________ Manager UI provides host and port information. a) Data Node b) NameNode
c) Resource d) Replication
41. During start up, the ___________ loads the file system state from the fsimage and the edits log file. a) DataNode b) NameNode c) ActionNode d) None of the mentioned
42. Point out the correct statement. a) A Hadoop archive maps to a file system directory b) Hadoop archives are special format archives c) A Hadoop archive always has a *.har extension d) All of the mentioned
43. Using Hadoop Archives in __________ is as easy as specifying a different input filesystem than the default file system. a) Hive b) Pig c) MapReduce d) All of the mentioned
44. Pig operates in mainly how many nodes? a) Two b) Three c) Four d) Five
45. Point out the correct statement. a) You can run Pig in either mode using the “pig” command b) You can run Pig in batch mode using the Grunt shell c) You can run Pig in interactive mode using the FS shell d) None of the mentioned
46. You can run Pig in batch mode using __________ a) Pig shell command b) Pig scripts c) Pig options d) All of the mentioned
47. Pig Latin statements are generally organized in one of the following ways? a) A LOAD statement to read data from the file system b) A series of “transformation” statements to process the data
c) A DUMP statement to view results or a STORE statement to save the results d) All of the mentioned
48. Point out the wrong statement. a) To run Pig in local mode, you need access to a single machine b) The DISPLAY operator will display the results to your terminal screen c) To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation d) All of the mentioned
49. Which of the following function is used to read data in PIG? a) WRITE b) READ c) LOAD d) None of the mentioned
50. You can run Pig in interactive mode using the ______ shell. a) Grunt b) FS c) HDFS d) None of the mentioned
51. HBase is a distributed ________ database built on top of the Hadoop file system. a) Column-oriented b) Row-oriented c) Tuple-oriented d) None of the mentioned
52. Point out the correct statement. a) HDFS provides low latency access to single rows from billions of records (Random access) b) HBase sits on top of the Hadoop File System and provides read and write access c) HBase is a distributed file system suitable for storing large files d) None of the mentioned
53. HBase is ________ defines only column families. a) Row Oriented b) Schema-less c) Fixed Schema d) All of the mentioned
54. Apache HBase is a non-relational database modeled after Google’s _________ a) BigTop b) Bigtable
c) Scanner d) FoundationDB
55. Point out the wrong statement. a) HBase provides only sequential access to data b) HBase provides high latency batch processing c) HBase internally provides serialized access d) All of the mentioned
56. The _________ Server assigns regions to the region servers and takes the help of Apache ZooKeeper for this task. a) Region b) Master c) Zookeeper d) All of the mentioned
57. Which of the following command provides information about the user? a) status b) version c) whoami d) user
58. Which of the following command does not operate on tables? a) enabled b) disabled c) drop d) all of the mentioned
59. _________ command fetches the contents of a row or a cell. a) select b) get c) put d) none of the mentioned
60. HBaseAdmin and ____________ are the two important classes in this package that provide DDL functionalities. a) HTableDescriptor b) HDescriptor c) HTable d) HTabDescriptor
61. Which of the following is not a NoSQL database? a) SQL Server b) MongoDB
c) Cassandra d) None of the mentioned
62. Point out the correct statement. a) Documents can contain many different key-value pairs, or key-array pairs, or even nested documents b) MongoDB has official drivers for a variety of popular programming languages and development environments c) When compared to relational databases, NoSQL databases are more scalable and provide superior performance d) All of the mentioned
63. Which of the following is a NoSQL Database Type? a) SQL b) Document databases c) JSON d) All of the mentioned
64. Which of the following is a wide-column store? a) Cassandra b) Riak c) MongoDB d) Redis
65. Point out the wrong statement. a) Non Relational databases require that schemas be defined before you can add data b) NoSQL databases are built to allow the insertion of data without a predefined schema c) NewSQL databases are built to allow the insertion of data without a predefined schema d) All of the mentioned
66. Most NoSQL databases support automatic __________ meaning that you get high availability and disaster recovery. a) processing b) scalability c) replication d) all of the mentioned
67. Which of the following are the simplest NoSQL databases? a) Key-value b) Wide-column c) Document d) All of the mentioned
68. ________ stores are used to store information about networks, such as social connections. a) Key-value b) Wide-column c) Document d) Graph
69. NoSQL databases is used mainly for handling large volumes of ______________ data. a) unstructured b) structured c) semi-structured d) all of the mentioned
70. Point out the wrong statement? a) Key feature of R was that its syntax is very similar to S b) R runs only on Windows computing platform and operating system c) R has been reported to be running on modern tablets, phones, PDAs, and game consoles d) R functionality is divided into a number of Packages
71. R functionality is divided into a number of ________ a) Packages b) Functions c) Domains d) Classes
72. Which Package contains most fundamental functions to run R? a) root b) child c) base d) parent
73. Point out the wrong statement? a) One nice feature that R shares with many popular open source projects is frequent releases b) R has sophisticated graphics capabilities c) S’s base graphics system allows for very fine control over essentially every aspect of a plot or graph d) All of the mentioned
74. Which of the following is a base package for R language? a) util b) lang
c) tools d) spatial
75. Which of the following is “Recommended” package in R? a) util b) lang c) stats d) spatial
76. What is the output of getOption(“defaultPackages”) in R studio? a) Installs a new package b) Shows default packages in R c) Error d) Nothing will prin
77. Which of the following is used for Statistical analysis in R language? a) RStudio b) Studio c) Heck d) KStudio
78. In R language, a vector is defined that it can only contain objects of the ________ a) Same class b) Different class c) Similar class d) Any class
79. A list is represented as a vector but can contain objects of ___________ a) Same class b) Different class c) Similar class d) Any class
80. How can we define ‘undefined value’ in R language? a) Inf b) Sup c) Und d) NaN
81. What is NaN called? a) Not a Number b) Not a Numeric c) Number and Number d) Number a Numeric
82. How can we define ‘infinity’ in R language? a) Inf b) Sup c) Und d) NaN
83. Which one of the following is not a basic datatype? a) Numeric b) Character c) Data frame d) Integer
84. Matrices can be created by row-binding with the help of the following function. a) rjoin() b) rbind() c) rowbind() d) rbinding()
85. What is the function used to test objects (returns a logical operator) if they are NA? a) is.na() b) is.nan() c) as.na() d) as.nan()
86. What is the function used to test objects (returns a logical operator) if they are NaN? a) as.nan() b) is.na() c) as.na() d) is.nan()
87. What is the function to set column names for a matrix? a) names() b) colnames() c) col.names() d) column name cannot be set for a matrix
88. The most convenient way to use R is at a graphics workstation running a ________ system. a) windowing b) running c) interfacing d) matrix
89. Point out the wrong statement? a) Setting up a workstation to take full advantage of the customizable features of R is a
straightforward thing b) q() is used to quit the R program c) R has an inbuilt help facility similar to the man facility of UNIX d) Windows versions of R have other optional help systems also
90. Point out the wrong statement? a) Windows versions of R have other optional help system also b) The help.search command (alternatively ??) allows searching for help in various ways c) R is case insensitive as are most UNIX based packages, so A and a are different symbols and would refer to different variables d) $ R is used to start the R program
91. Elementary commands in R consist of either _______ or assignments. a) utilstats b) language c) expressions d) packages
92. How to install for a package and all of the other packages on which for depends? a) install.packages (for, depends = TRUE) b) R.install.packages (“for”, depends = TRUE) c) install.packages (“for”, depends = TRUE) d) install (“for”, depends = FALSE)
93. __________ function is used to watch for all available packages in library. a) lib() b) fun.lib() c) libr() d) library()
94. Attributes of an object (if any) can be accessed using the ______ function. a) objects() b) attrib() c) attributes() d) obj()
95. R objects can have attributes, which are like ________ for the object. a) metadata b) features c) expression d) dimensions
96. ________ generate random Normal variates with a given mean and standard deviation. a) dnorm b) rnorm
c) pnorm d) rpois
97. Point out the correct statement? a) R comes with a set of pseudo-random number generators b) Random number generators cannot be used to model random inputs c) Statistical procedure does not require random number generation d) For each probability distribution there are typically three functions
98. ______ evaluate the cumulative distribution function for a Normal distribution. a) dnorm b) rnorm c) pnorm d) rpois
99. _______ generate random Poisson variates with a given rate. a) dnorm b) rnorm c) pnorm d) rpois
100. Point out the wrong statement? a) For each probability distribution there are typically three functions b) For each probability distribution there are typically four functions c) r function is sufficient for simulating random numbers d) R comes with a set of pseudo-random number generators
101. _________ is the most common probability distribution to work with. a) Gaussian b) Parametric c) Paradox d) Simulation
102. Point out the correct statement? a) When simulating any random numbers it is not essential to set the random number seed b) It is not possible to generate random numbers from other probability distributions like the Poisson c) You should always set the random number seed when conducting a simulation d) Statistical procedure does not require random number generation
103. _______ function is used to simulate binary random variables. a) dnorm b) rbinom() c) binom() d) rpois
104. Point out the wrong statement? a) Drawing samples from specific probability distributions can be done with “s” functions b) The sample() function draws randomly from a specified set of (scalar) objects allowing you to sample from arbitrary distributions of numbers c) The sampling() function draws randomly from a specified set of objects d) You should always set the random number seed when conducting a simulation
105. _______ grammar makes a clear distinction between your data and what gets displayed on the screen or page. a) ggplot1 b) ggplot2 c) d3.js d) ggplot3
106. Point out the wrong statement? a) mean_se is used to calculate mean and standard errors on either side b) hmisc wraps up a selection of summary functions from Hmisc to make it easy to use c) plot is used to create a scatterplot matrix (experimental) d) translate_qplot_base is used for translating between qplot and base graphics
107. Which of the following cuts numeric vector into intervals of equal length? a) cut_interval b) cut_time c) cut_number d) cut_date
108. Which of the following is a plot to investigate the order in which observations were recorded? a) ggplot b) ggsave c) ggpcp d) ggorder
109. ________ is used for translating between qplot and base graphics. a) translate_qplot_base b) translate_qplot_gpl c) translate_qplot_lattice d) translate_qplot_ggplot
110. Which of the following is discrete state calculator? a) discrete_scale b) ggpcp c) ggfluctuation d) ggmissing
111. Which of the following creates fluctuation plot? a) ggmissplot b) ggmissing c) ggfluctuation d) ggpcp
112. __________ create a complete ggplot appropriate to a particular data type. a) autoplot b) is.ggplot c) printplot d) qplot_ggplot
113. Which of the following creates a new ggplot plot from a data frame? a) qplot_ggplot b) ggplot.data.frame c) ggfluctuation d) ggmissplot
Department of Information Technology
DATA ANALYTICS – KIT601 – Question Bank
UNIT-1
1. Data originally collected in the process of investigation are known as a) Foreign data b) Primary data c) Third data d) Secondary data e) None of these
2. Statistical enquiry means 1. It is science for knowledge 2. Search for knowledge 3. Collection of anything 4. Search for knowledge with the help of statistical methods. e) None of these
3. Cluster sampling means a) Sample is divided into number of sub-groups b) Sample are selected at regular interval c) Sample is obtained by conscious selection d) Universe is divided into groups e) None of these
4. What is Secondary data? a) Data collected in the process of investigation b) Data collected from some other agency c) Data collected from questionnaire of a person d) Both A & B e) None of these
5. What is information? a) Raw facts b) Processed data c) Understanding facts d) Knowing action on data e) None of these
6. Data about rocks is an example of a) Time dependent data b) Time Independent data c) Location dependent data d) Location independent data e) None of these
7. Range on temperature scale is termed as a) Nominal data b) Ordinal data
Department of Information Technology
c) Interval data d) Ratio data e) None of these
8. Data in XML and CSV format is an example of a) Structure data b) Un-structure data c) Semi-structure data d) Both A & B e) None of these
9. Which is not the characteristic of data a) Accuracy b) Consistency c) Granularity d) Redundant e) None of these
10. Hadoop is a framework that works with a variety of related tools. Common cohorts include: a) MapReduce, Hive and HBase b) MapReduce, MySQL and Google Apps c) MapReduce, Hummer and Iguana d) MapReduce, Heron and Trumpet e) None of these
11. Which is not the V in BIG data a) Volume b) Veracity c) Vigor d) Velocity e) None of these
12. Which is not true about Traditional decision making? a) Does not require human intervention b) Takes a long time to come to decision c) Lacks systematic linkage in planning d) Provides limited scope of data analytics e) None of these
13. Cloudera is a product of a) Microsoft b) Apache c) Google d) Facebook e) None of these
14. What is not true about MPP architecture a) Tightly coupled nodes b) High speed connection among nodes c) Disk are not shared
Department of Information Technology
d) Uses lot of processors e) None of these
15. The process of organizing and summarizing data in an easily readable format to communicate important information is known as a) Analysis b) Reporting c) Clustering d) Mining e) None of these
16. Out of the following which is not a type of report a) Canned b) Dashboard c) Ad hoc response d) Alerts e) None of these
17. Data Analysis is a process of? a) inspecting data b) cleaning data c) transforming data d) All of above e) None of these
18. Which of the following is not a major data analysis approaches? a) Data Mining b) Predictive Intelligence c) Business Intelligence d) Text Analytics e) None of these
19. How many main statistical methodologies are used in data analysis? a) 2 b) 3 c) 4 d) 5 e) None of these
20. Which of the following is true about regression analysis? a) answering yes/no questions about the data b) estimating numerical characteristics of the data c) modeling relationships within the data d) describing associations within the data e) None of these
21. __________ may be defined as the data objects that do not comply with the general behavior or model of the data available. a) Outlier Analysis b) Evolution Analysis
Department of Information Technology
c) Prediction d) Classification e) None of these
22. What is the use of data cleaning? a) to remove the noisy data b) correct the inconsistencies in data c) transformations to correct the wrong data. d) All of the above e) None of these
23. In data mining, this is a technique used to predict future behavior and anticipate the consequences of change. a) predictive technology b) disaster recovery c) phase change d) predictive modeling e) None of these
24. What are the main components of Big Data? a) MapReduce b) HDFS c) HBASE d) All of these e) None of these
25. ———- data that depends on data model and resides in a fixed field within a record. a) Structured data b) Un-Structured data c) Semi-Structured data d) Scattere e) None of these
26. —————- is about developing code to enable the machine to learn to perform tasks and its basic principle is the automatic modeling of underlying that have generated the collected data. a) Data Science b) Data Analytics c) Data Mining d) Data Warehousing e) None of these
27. —————– is an example of human generated unstructured data. a) YouTube data b) Satellite data c) Sensor data d) Seismic imagery data e) None of these
Department of Information Technology
28. Height is an example of which type of attribute a) Nominal b) Binary c) Ordinal d) Numeric e) None of these
29. ————-type of analytics describes what happened in past a) Descriptive b) Prescriptive c) Predictive d) Probability e) None of these
30. ————– data does not fits into a data model due to variations in contents a) Structured data b) Un - Structured data c) Semi Structured data d) Both B & C e) None of these
Department of Information Technology
UNIT-2
31. A and B are two events. If P(A, B) decreases while P(A) increases, which of the following is true? a) P(A|B) decreases b) P(B|A) decreases c) P(B) decreases d) All of above e) None of these
32. Suppose we like to calculate P(H|E, F) and we have no conditional independence information. Which of the following sets of numbers are sufficient for the calculation? a) P(E, F), P(H), P(E|H), P(F|H) b) P(E, F), P(H), P(E, F|H) c) P(H), P(E|H), P(F|H) d) P(E, F), P(E|H), P(F|H) e) None of these
33. Suppose you are given an EM algorithm that finds maximum likelihood estimates for a model with latent variables. You are asked to modify the algorithm so that it finds MAP estimates instead. Which step or steps do you need to modify? a) Expectation b) Maximization c) No modification necessary d) Both A & B e) None of these
34. Compared to the variance of the Maximum Likelihood Estimate (MLE), the variance of the Maximum A Posteriori (MAP) estimate is ________ a) higher b) same c) lower d) it could be any of the above e) None of these
35. Bayesian methods are important to our study of machine learning is that they provide a useful perspective for understanding many learning algorithms that do not ............................ manipulate probabilities. a) explicitly b) implicitly c) both a & b d) approximately e) None of these
36. The results that we get after we apply Bayesian Theorem to a problem are, a) 100% accurate b) Estimated values c) Wrong values d) Only positive values e) None of these
Department of Information Technology
37. The previous probabilities in Bayes theorem that are changed with the help of new available information are classified as a) independent probabilities b) posterior probabilities c) interior probabilities d) dependent probabilities e) None of these
38. In contrast to the naive Bayes classifier, Bayesian belief networks allow stating conditional independence assumptions that apply to ............................... of the variables. a) subsets b) super sets c) empty set d) All of above e) None of these
39. The naive Bayes classifier applies to learning tasks where each instance x is described by a conjunction of attribute values and where the target function f ( x ) can take on ................. value from some................... set V. a) one, finite b) any, infinite c) one, infinite d) any, finite e) None of these
40. Bayes rule can be used to........................conditioned on one piece of evidence. a) solve queries b) increase complexity of a query c) decrease complexity of a query d) answer probabilistic queries e) None of these
41. Among which of the following mentioned statements can the Bayesian probability be applied? (i) In the cases, where we have one event (ii) In the cases, where we have two events (iii) In the cases, where we have three events (iv) In the cases, where we have more than three events
Options:
a) Only iv. b) All i., ii., iii. and iv. c) ii. and iv. d) Only ii. e) None of these
42. How the Bayesian network can be used to answer any query? a) Full distribution b) Joint distribution
Department of Information Technology
c) Partial distribution d) All of the mentioned above e) None of these
43. Which of the following methods do we use to find the best fit line for data in Linear Regression? a) Least Square Error b) Maximum Likelihood c) Logarithmic Loss d) Both A and B e) None of these
44. Linear Regression is a ..................... machine learning algorithm. a) supervised b) unsupervised c) reinforcement d) Both A & B e) None of these
45. Which of the following statement is true about outliers in Linear regression? a) Linear regression is not sensitive to outliers b) Linear regression is sensitive to outliers c) Can’t say d) There are no outliers e) None of these
46. Which of the following sentence is FALSE regarding regression? a) It relates inputs to outputs. b) It is used for prediction. c) It may be used for interpretation. d) It discovers causal relationships. e) None of these
47. Which of the following methods do we use to best fit the data in Logistic Regression? a) Least Square Error b) Maximum Likelihood c) Jaccard distance d) Both A & B e) None of these
48. Which of the following option is true? a) Linear Regression errors values has to be normally distributed but in case of Logistic Regression it is not the case b) Logistic Regression errors values has to be normally distributed but in case of Linear Regression it is not the case c) Both Linear Regression and Logistic Regression error values have to be normally distributed d) Both Linear Regression and Logistic Regression error values have not to be normally distributed e) None of these
Department of Information Technology
49. A decision tree is also known as a) general tree b) binary tree c) prediction tree d) fuzzy tree e) None of these
50. The confusion matrix is a useful tool for analyzing a) Regression b) Classification c) Sampling d) Cross Validation e) None of these
51. In regression the independent variable is also called as ———– a) Regressor b) Continuous c) Regressand d) Estimated e) None of these
52. ————— searches for the linear optimal separating hyperplane for separation of the data using essential training tuples called support vectors a) Decision tree b) Association Rule Mining c) Clustering d) Support vector machines e) None of these
53. Which of the following is used as attribute selection measure in decision tree algorithms? a) Information Gain b) Posterior probability c) Prior probability d) Support e) None of these
54. ———- is unsupervised technique aiming to divide a multivariate dataset into clusters or groups. a) KNN b) SVM c) Regression d) Cluster Analysis e) None of these
55. A perfect negative correlation is signified by ————- a) 1 b) -1 c) 0 d) 2
Department of Information Technology
e) None of these
56. ———— rule mining is a technique to identify underlying relations between different items. a) Classification b) Regression c) Clustering d) Association e) None of these
57. ———– is supervised machine learning algorithm outputs an optimal hyperplane for given labeled training data a) KNN b) SVM c) Regression d) Decision Tree e) None of these
58. Which of the following is measure used in decision trees while selecting splitting criteria that partitions data into the best possible manner. a) Probability b) Gini Index c) Regression d) Confusion matrix e) None of these
59. Which of the following is not a type of clustering algorithm? a) Density clustering b) K-Means clustering c) Centroid clustering d) Simple clustering e) None of these
60. —— answers the questions like ” How can we make it happen?” a) Descriptive b) Prescriptive c) Predictive d) Probability e) None of these
Department of Information Technology
UNIT-3
61. Some company wants to divide their customers into distinct groups to send offers this is an example of a) Data Extraction b) Data Classification c) Data Discrimination d) Data Selection e) None of these
62. When do we use Manhattan distance in data mining? a) Dimension of the data decreases b) Dimension of the data increases c) Under fitting d) Moderate size of the dimensions e) None of these
63. When there is no impact on one variable when increase or decrease on other variable then it is ———— a) Perfect correlation b) Positive correlation c) Negative correlation d) No correlation e) None of these
64. Apriori algorithm uses breadth first search and ————structure to count candidate item sets efficiently. a) Decision tree b) Hash Tree c) Red-Black Tree d) AVL Tree e) None of these
65. To determine basic salary of an employee when his qualification is given is a ———– problem a) Correlation b) Regression c) Association d) Qualitative e) None of these
66. ————the step is performed by data scientist after acquiring the data. a) Data Cleansing b) Data Integration c) Data Replication d) Data loading e) None of these
67. ———– is an indication of how often the rule has been found to be true in association rule mining. a) Confidence
Department of Information Technology
b) Support c) Lift d) Accuracy e) None of these
68. Which of the following statements about data streaming is true? a) Stream data is always unstructured data. b) Stream data often has a high velocity. c) Stream elements cannot be stored on disk. d) Stream data is always structured data. e) None of these
69. A Bloom filter guarantees no a) false positives b) false negatives c) false positives and false negatives d) false positives or false negatives, depending on the Bloom filter type e) None of these
70. The FM-sketch algorithm can be used to: a) Estimate the number of distinct elements. b) Sample data with a time-sensitive window. c) Estimate the frequent elements. d) Determine whether an element has already occurred in previous stream data. e) None of these
71 The DGIM algorithm was developed to estimate the counts of 1's occur within the last k bits of a stream window N. Which of the following statements is true about the estimate of the number of 0's based on DGIM? a) The number of 0's cannot be estimated at all. b) The number of 0's can be estimated with a maximum guaranteed error. c) To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's. d) Only 1’s can be estimated not 0’s e) None of these
72. What are DGIM’s maximum error boundaries? a) DGIM always underestimates the true count; at most by 25% b) DGIM either underestimates or overestimates the true count; at most by 50% c) DGIM always overestimates the count; at most by 50% d) DGIM either underestimates or overestimates the true count; at most by 25% e) None of these
73. Which algorithm should be used to approximate the number of distinct elements in a data stream? a) Misra-Gries b) Alon-Matias-Szegedy c) DGIM d) Apriori e) None of these
Department of Information Technology
74. Which of the following statements about standard Bloom filters is correct? a) It is possible to delete an element from a Bloom filter. b) A Bloom filter always returns the correct result. c) It is possible to alter the hash functions of a full Bloom filter to create more space. d) A Bloom filter always returns TRUE when testing for a previously added element. e) None of these
75. ETL stands for ________________ a) Extraction transformation and loading b) Extract Taken Lend c) Enterprise Transfer Load d) Entertainment Transference Load e) None of these
76. Which of the following is not a major data analysis approaches? a) Data Mining b) Predictive Intelligence c).Business Intelligence d) Text Analytics e) None of these
77. What do you mean by Real Time ANALYTICS platform. a) Manages and process data and helps timely decision making b helps to develop dynamic analysis application c) leads to evolution of non business intelligence d) hadoop e)None of these
78. Data Analysis is defined by the statistician? a) William S. b)Hans Peter Luhn c) Gregory Piatetsky-Shapiro d) John Tukey e)None of these
79 Which of the following is a wrong statement. a). The big volume actually represents Big Data b). Big Data is just about tons of data c). The data growth and social media explosion have improved that how we look at the data d). All of these e). None of these
80 Which of the following emphases on the discovery of earlier properties that are not known on the data? a) Machine Learning b). Big Data c). Data wrangling d). Data mining e)None of these
Department of Information Technology
81 What are DGIM’s maximum error boundaries? a)DGIM always underestimates the true count; at most by 25% b)DGIM either underestimates or overestimates the true count; at most by 50% c)DGIM always overestimates the count; at most by 50% d)DGIM either underestimates or overestimates the true count; at most by 25% e)None of these
82 A Bloom filter guarantees no a)false positives b)false negatives c)false positives and false negatives d)false positives or false negatives, depending on the Bloom filter e)None of these
83. Which of the following statements about the standard DGIM algorithm are false? a) DGIM operates on a time-based window. b) In DGIM, the size of a bucket is always a power of two. c) The maximum number of buckets has to be chosen beforehand. d) The buckets contain the count of 1's and each 1's specific position in the stream. e)None of these
84 What are two differences between large-scale computing and big data processing? a)hardware b) Data is more suitable for finding new patterns in data than Large Scale Computing c) amount of processing time available d) amount of data processed e)None of these
85 In Flajolet-Martin algorithm if the stream contains n elements with m of them unique, this algorithm runs in a) O(n) time b) constant time c) O(2n) time d) O(3n)time e)None of these
86 What are two differences between large-scale computing and big data processing? a) hardware b) Data is more suitable for finding new patterns in data than Large Scale Computing c) amount of processing time available d) number of passes made over the data e)None of these
87 what does it mean when an algorithm is said to 'scale well'? a) The running time does not increase exponentially when data becomes longer. b)The result quality goes up when the data becomes larger. c) The memory usage does not increase exponentially when data becomes larger. d) The result quality remains the same when the data becomes larger. e)None of these
Department of Information Technology
89The FM-sketch algorithm can be used to: a)Estimate the number of distinct elements. b)Sample data with a time-sensitive window. c)Estimate the frequent elements. d)Determine whether an element has already occurred in previous stream data. e)None of these
90Which attribute is _not_ indicative for data streaming? a)Limited amount of memory b)Limited amount of processing time c)Limited amount of input data d)Limited amount of processing power e)None of these
UNIT 4
91 Which of the following clustering type has characteristic shown in the below figure?
a) Exploratory b) Inferential c) Causal d) Hierarchical Clustering e)None of these
92 Which of the following dimension type graph is shown in the below figure?
a) one-dimensional b) two-dimensional c) three-dimensional d) four-dimensional e)None of these
93 Which of the following gave rise to need of graphs in data analysis? a)Data visualization b) Communicating results
Department of Information Technology
c) Decision making d) All of the mentioned e)None of these
94Which of the following is characteristic of exploratory graph? a) Made slowly b) Axes are not cleaned up c) Color is used for personal information d) All of the mentioned e)None of these
95Color and shape are used to add dimensions to graph data. a)True b) False c)Dilemma d)Incorrect Statement e)None of these
96.Which of the following information is not given by five-number summary? a) Mean b) Median c) Mode d) All of the mentioned e)None of these
97.Which of the following is also referred to as overlayed 1D plot? a)lattice b) barplot c) gplot d) all of the mentioned e)None of these
98.Spinning plots can be used for two dimensional data. a)True b) False c)Incorrect d)Not Sure e)None of these
99 Point out the correct statement. a) coplots are one dimensional data graph b) Exploratory graphs are made quickly c) Exploratory graphs are made relatively less in number d) All of the mentioned e)None of these
100 Which of the following clustering technique is used by K- Means Algorithm a)HierarchicalTechnique b)Partitional technique c)Divisive
Department of Information Technology
d) Agglomerative e)None of these
101.SON algorithm is also known as a)PCY Algorithm b MultistageAlgorithm c)Multihash Algorithm d)Partition Algorithm D e)None of these
102. Which technique is used to filter unnecessary itemset in PCY algorithm a )Association Rule b)Hashing Technique c)Data Mining d)Market basket B e)None of these
103 In association rule, which of the following indicates the measure of how frequently the items occur in a dataset ? a)Support b)Confidence c)Basket d)Itemset e)None of these
104.Which term indicated the degree of corelation in dataset between X and Y, if the given association rule given is X-->Y a)Confidence b)Monotonicity c)Distinct d)Hashing e)None of these
105.During start up, the ___________ loads the file system state from the fsimage and the edits log file. a) DataNode b) NameNode c) ActionNode d) Data Action Node e)None of these
106 Which of the following scenario may not be a good fit for HDFS? a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to thesame file b) HDFS is suitable for storing data related to applications requiring low latency data access c) HDFS is suitable for storing data related to applications requiring low latency data access d) HDFS is suitable for scenarios requiring multiple/simultaneous writes to thesame file e)None of these
Department of Information Technology
107________ is the slave/worker node and holds the user data in the form of Data Blocks. a) DataNode b) NameNode c) Data block d) Replication e)None of these
108.HDFS provides a command line interface called __________ used to interact with HDFS. a) “HDFS Shell” b) “FS Shell” c) “DFS Shell” d) None of the mentioned e)None of these 109 What is CLIQUE ? a)CLIQUE is a grid based method for finding density based clusters in subspaces. b)CLIQUE is a click method c)used to prune non- promising cells and to improve efficiency d)used to measure distance e)None of these
110 CLIQUE stands for ? a) Clustering in QUEst b) Common in Quest c)Calculate in Quest d)Click in Quest e)None of these
111What is approaches for high dimensional data clustering a)Subspace clustering b)Projected clustering and Biclustering. c) Data Clustering d)Space Clustering e)None of these
112Applications of frequent itemset analysis a) Related concepts ,Plagiarism , Biomarkers b)CLUSTERING c)Design d)Operation e)None of these
113. k-means is a ………..based algorithm or distance based algorithm where we calculate the distances to assign a point to a cluster a) centroid b)Distance c)Neuron d)Dendron e) None of these
Department of Information Technology
114--------is an algorithm for frequent item set mining and association rule learning over relational databases a)Confidence b) Apriori c)Disadvantage d)Market basket e) None of these
115 The HBase database includes the Hadoop list, the Apache Mahout ________ system, and matrix operations. A. Statistical classification B. Pattern recognition C. Machine learning D. Artificial intelligence E. All of these
116 To discover interesting relations between objects in larger databases is a objective of ---- a) Freqent Set Mining b)Market basket Mining c) association rules mining d) Confidence Gain e) None of these
117 Different methods for storing itemset count in main memory. a)The triangular matrix method b)The triples method c)Angular method d)Square Method e) None of these
118 ------used to prune non- promising cells and to improve efficiency. a)market basket b)frequent itemset c)Support d) aprioriproperty e) None of these
119 dentify the algorithm in which, on the first pass we count the item themselves and then determine which items are frequent. On the second pass we count only the pairs of item both of which are found frequent on first pass a)DGIM b)CURE c)Pagerank d)Apriori e)None of these 120 A resource used for sharing data globally by all nodes is a)Distributed b) Cache Centralised Cache c)secondry memory d)primarymemory e) None of these
Department of Information Technology
Unit 5 121.Input to the is the sorted output of the mappers. a) Reducer b) Mapper c) Shuffle d) All of the above e)None of these
122. Which of the following statements about data streaming is true? a)Stream data is always unstructured data. b)Stream data often has a high velocity. c)Stream elements cannot be stored on disk. d)Stream data is always structured data. e)None of these
123 The output of the is not sorted in the Mapreduce framework for Hadoop. (A) Mapper (B) Cascader (C) Scalding (D) None of the above e) None of these
124: Which of the following phases occur simultaneously? (A) Reduce and Sort (B) Shuffle and Sort (C) Shuffle and Map d)sort and ruduce e) None of these
125.A ________ serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication e) None of these
126.HDFS works in a __________ fashion. a) master-worker b) master-slave c) worker/slave d) all of the mentioned e) None of these
127.________ NameNode is used when the Primary NameNode goes down. a) Rack b) Data c) Secondary d) None of the mentioned e) None of these
Department of Information Technology
128 Point out the wrong statement. a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode c) User data is stored on the local file system of DataNodes d) DataNode is aware of the files to which the blocks stored on it belong to e) None of these
129 The need for data replication can arise in various scenarios like ____________ a) Replication Factor is changed b) DataNode goes down c) Data Blocks get corrupted d) All of the mentioned e) None of these
130.For YARN, the ___________ Manager UI provides host and port information. a) Data Node b) NameNode c) Resource d) Replication e) None of these 131 HDFS works in a __________ fashion. a)worker-master fashion b)master-slave fashion c)master-worker fashion d)slave-master e)None of these
132HDFS is implemented in _____________ language. a) C b)Perl c)Python d)Java e)none of these
133 The default block size in hadoop is ______. a)16MB b) 32MB c)64MB d)128MB e) none of these
134. ____ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data. a)MapReduce b)Mahout c)Oozie d)Hbase e)None of these
Department of Information Technology
135 Mapper and Reducer implementations can use the to report progress or just indicate that they are alive. (A) Partitioner (B) OutputCollector (C) Reporter (D) All of the above e) None of these
136 is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer. (A) Partitioner (B) OutputCollector (C) Reporter (D) All of the above e) None of these
137 A serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication e) None of these
138 HDFS works in a fashion. (A)a)masterworker b) master-slave c) worker/slave d) All of the above e) None of these
139 NameNode is used when the Primary NameNode goes down. a) Rack b) Data c) Secondary d) None e)none of these
140 HDFS is implemented in programming language. ( a) C++ b) Java c) Scala d) None e) None of these
141 Hadoop developed by _______________ a)Larry Page b)Doug Cutting c)Mark d)Bill Gates e) None of these
Department of Information Technology
142.The MapReduce algorithm contains two important tasks, namely __________. a)mapped, reduce b)mapping, Reduction c) Map, Reduction d) Map, Reduce e)None of these
143.mapper and reducer classes extends classes from the package a) org.apache.hadd op.mapreduce b)apache.hadoop c)org.mapreduce d)hadoop.mapreduce e) None of these
144.HDFS inherited from ------------- file system. a)Yahoo b) FTFS c)Google d)Rediff e) none of these
145 NameNode is used when the Primary NameNode goes down. a) Rack b) Data c) Secondary d) Primary e) None of these
146 HDFS works in a fashion. a) master-worker b) master-slave c) worker/slave d) All of the above e) None of these
147: A serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication e) None of these
148 HDFS provides a command line interface called used to interact with HDFS. a) HDFS Shell b) FS Shell c) DFSA Shell d) NO shell e) None of these
Department of Information Technology
149 is the slave/worker node and holds the user data in the form of Data Blocks. a) DataNode b) NameNode c) Data lock d) Replication e) None of these
150. is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution. a) Map Parameters b) JobConf c) MemoryConf d) All of the above e) None of these
Department of Information Technology
Data Analytics KIT-601 Answer key UNIT-1 UNIT-2 UNIT-3 UNIT-4 UNIT-5 1-b 31-b 61-b 91-d 121-a 2-d 32-b 62-b 92-b 122-b 3-d 33-b 63-d 93-d 123-d 4-b 34-c 64-b 94-c 124-a 5-b 35-a 65-d 95-a 125-b 6-c 36-b 66-a 96-c 126-a 7-c 37-b 67-a 97-a 127-c 8-c 38-a 68-b 98-a 128-a 9-d 39-d 69-b 99-a 129-d 10-a 40-d 70-a 100-b 130-c 11-c 41-d 71-b 101-d 131-b 12-a 42-b 72-b 102-b 132-d 13-b 43-a 73-e 103-a 133-c 14-a 44-a 74-d 104-a 134-a 15-b 45-b 75-a 105-b 135-c 16-c 46-d 76-b 106-a,d 136-b 17-d 47-b 77-a,b 107-a 137-b 18-b 48-a 78-d 108-b 138-a 19-a 49-c 79-b 109-a 139-c 20-c 50-b 80-d 110-a 140-b 21-a 51-a 81-b 111-a,b 141-b 22-d 52-d 82-b 112-a 142-d 23-d 53-a 83-c,d 113-a 143-a 24-d 54-d 84-a,b 114-b 144-c 25-a 55-c 85-a 115-c,d 145-c 26-b 56-d 86-b 116-c 146-b 27-a 57-b 87-a,b 117-a,b 147-b 28-d 58-b 88-c 118-b 148-b 29-a 59-d 89-a,d 119-d 149-a 30-b 60-b 90-c 120-a 150-b
***************Data Analytics MCQs Set - 1***************
1. The branch of statistics which deals with development of particular statistical methods
is classified as
1. industry statistics
2. economic statistics
3. applied statistics
4. applied statistics
Answer: applied statistics
2. Which of the following is true about regression analysis?
1. answering yes/no questions about the data
2. estimating numerical characteristics of the data
3. modeling relationships within the data
4. describing associations within the data
Answer: modeling relationships within the data
3. Text Analytics, also referred to as Text Mining?
1. True Join:- https://t.me/AKTU_Notes_Books_Quantum
2. False
3. Can be true or False
4. Can not say
Answer: True
4. What is a hypothesis?
1. A statement that the researcher wants to test through the data collected in a study.
2. A research question the results will answer.
3. A theory that underpins the study.
4. A statistical method for calculating the extent to which the results could have happened by
chance.
Answer: A statement that the researcher wants to test through the data collected in a study.
5. What is the cyclical process of collecting and analysing data during a single research
study called?
1. Interim Analysis
2. Inter analysis
3. inter item analysis
4. constant analysis
Answer: Interim Analysis
6. The process of quantifying data is referred to as ____ Join:- https://t.me/AKTU_Notes_Books_Quantum
1. Topology
2. Digramming
3. Enumeration
4. coding
Answer: Enumeration
7. An advantage of using computer programs for qualitative data is that they _
1. Can reduce time required to analyse data (i.e., after the data are transcribed)
2. Help in storing and organising data
3. Make many procedures available that are rarely done by hand due to time constraints
4. All of the above
Answer: All of the Above
8. Boolean operators are words that are used to create logical combinations.
1. True
2. False
Answer: True
9. ______ are the basic building blocks of qualitative data.
1. Categories Join:- https://t.me/AKTU_Notes_Books_Quantum
2. Units
3. Individuals
4. None of the above
Answer: Categories
10. This is the process of transforming qualitative research data from written interviews or field notes into typed text.
1. Segmenting
2. Coding
3. Transcription
4. Mnemoning
Answer: Transcription
11. A challenge of qualitative data analysis is that it often includes data that are unwieldy and complex; it is a major challenge to make sense of the large pool of data.
1. True
2. False
Answer: True
12. Hypothesis testing and estimation are both types of descriptive statistics.
1. True
2. False Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: False
13. A set of data organised in a participants(rows)-by-variables(columns) format is known as a “data set.”
1. True
2. False
Answer: True
14. A graph that uses vertical bars to represent data is called a ___
1. Line graph
2. Bar graph
3. Scatterplot
4. Vertical graph
Answer: Bar graph
15. ____ are used when you want to visually examine the relationship between two
quantitative variables.
1. Bar graph
2. pie graph
3. line graph
4. Scatterplot Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: Scatterplot
16. The denominator (bottom) of the z-score formula is
1. The standard deviation
2. The difference between a score and the mean
3. The range
4. The mean
Answer: The standard deviation
17. Which of these distributions is used for a testing hypothesis?
1. Normal Distribution
2. Chi-Squared Distribution
3. Gamma Distribution
4. Poisson Distribution
Answer: Chi-Squared Distribution
18. A statement made about a population for testing purpose is called?
1. Statistic
2. Hypothesis
3. Level of Significance
4. Test-Statistic Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: Hypothesis
19. If the assumed hypothesis is tested for rejection considering it to be true is called?
1. Null Hypothesis
2. Statistical Hypothesis
3. Simple Hypothesis
4. Composite Hypothesis
Answer: Null Hypothesis
20. If the null hypothesis is false then which of the following is accepted?
1. Null Hypothesis
2. Positive Hypothesis
3. Negative Hypothesis
4. Alternative Hypothesis.
Answer: Alternative Hypothesis.
21. Alternative Hypothesis is also called as?
1. Composite hypothesis
2. Research Hypothesis
3. Simple Hypothesis
4. Null Hypothesis Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: Research Hypothesis
*************** Data Analytics MCQs Set – 2 ***************
1. What is the minimum no. of variables/ features required to perform clustering?
1.0
2.1
3.2
4.3
Answer: 1
2. For two runs of K-Mean clustering is it expected to get same clustering results?
1. Yes
2. No
Answer: No
3. Which of the following algorithm is most sensitive to outliers? Join:- https://t.me/AKTU_Notes_Books_Quantum
1. K-means clustering algorithm
2. K-medians clustering algorithm
3. K-modes clustering algorithm
4. K-medoids clustering algorithm
Answer: K-means clustering algorithm
4. The discrete variables and continuous variables are two types of
1. Open end classification
2. Time series classification
3. Qualitative classification
4. Quantitative classification
Answer: Quantitative classification
5. Bayesian classifiers is
1. A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.
2. Any mechanism employed by a learning system to constrain the search space of a hypothesis
3. An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation.
4. None of these
Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.
6. Classification accuracy is
1. A subdivision of a set of examples into a number of classes
2. Measure of the accuracy, of the classification of a concept that is given by a certain theory
3. The task of assigning a classification to a set of examples
4. None of these
Answer: Measure of the accuracy, of the classification of a concept that is given by a certain theory
7. Euclidean distance measure is
1. A stage of the KDD process in which new data is added to the existing selection.
2. The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them
3. The distance between two points as calculated using the Pythagoras theorem
4. none of above
Answer: The distance between two points as calculated using the Pythagoras theorem
8. Hybrid is
1. Combining different types of method or information Join:- https://t.me/AKTU_Notes_Books_Quantum
2. Approach to the design of learning algorithms that is structured along the lines of the theory of evolution.
3. Decision support systems that contain an information base filled with the knowledge of an expert formulated in terms of if-then rules.
4. none of above
Answer: Combining different types of method or information
9. Decision trees use , in that they always choose the option that seems the best available at that moment.
1. Greedy Algorithms
2. divide and conquer
3. Backtracking
4. Shortest path algorithm
Answer: Greedy Algorithms
10. Discovery is
1. It is hidden within a database and can only be recovered if one is given certain clues (an example IS encrypted information).
2. The process of executing implicit previously unknown and potentially useful information from data
3. An extremely complex molecule that occurs in human chromosomes and that carries genetic
information in the form of genes.
4. None of these
Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: The process of executing implicit previously unknown and potentially useful information from data
11. Hidden knowledge referred to
1. A set of databases from different vendors, possibly using different database paradigms
2. An approach to a problem that is not guaranteed to work but performs well in most cases
3. Information that is hidden in a database and that cannot be recovered by a simple SQL query.
4. None of these
Answer: Information that is hidden in a database and that cannot be recovered by a simple SQL query.
12. Decision trees cannot handle categorical attributes with many distinct values, such as country codes for telephone numbers.
1. True
2. False
Answer: False
15. CNMICHMENT IS
1. A stage of the KDD process in which new data is added to the existing selection
2. The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them
3. The distance between two points as calculated using the Pythagoras theorem.
4. None of these Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: A stage of the KDD process in which new data is added to the existing selection
14. are easy to implement and can execute efficiently even without prior knowledge of the data, they are among the most popular algorithms for classifying text documents.
1. 1D3
2. Naive Bayes classifiers
3. CART
4. None of above
Answer: Naive Bayes classifiers
15. High entropy means that the partitions in classification are
1. Pure
2. Not Pure
3. Usefull
4. useless
Answer: Uses a single processor or computer
16. Which of the following statements about Naive Bayes is incorrect?
1. Attributes are equally important.
2. Attributes are statistically dependent of one another given the class value.
3. Attributes are statistically independent of one another given the class value. Join:- https://t.me/AKTU_Notes_Books_Quantum
4. Attributes can be nominal or numeric
Answer: Attributes are statistically dependent of one another given the class value.
17. The maximum value for entropy depends on the number of classes so if we have 8 Classes what will be the max entropy.
1. Max Entropy is 1
2. Max Entropy is 2
3. Max Entropy is 3
4. Max Entropy is 4
Answer: Max Entropy is 3
18. Point out the wrong statement.
1. k-nearest neighbor is same as k-means
2. k-means clustering is a method of vector quantization
3. k-means clustering aims to partition n observations into k clusters
4. none of the mentioned
Answer: k-nearest neighbor is same as k-means
19. Consider the following example “How we can divide set of articles such that those articles have the same theme (we do not know the theme of the articles ahead of time) ” is this:
1. Clustering
2. Classification Join:- https://t.me/AKTU_Notes_Books_Quantum
3. Regression
4. None of these
Answer: Clustering
20. Can we use K Mean Clustering to identify the objects in video?
1. Yes
2. No
Answer: Yes
21. Clustering techniques are in the sense that the data scientist does not determine, in advance, the labels to apply to the clusters.
1. Unsupervised
2. supervised
3. Reinforcement
4, Neural network
Answer: Unsupervised
22. metric is examined to determine a reasonably optimal value of k.
1. Mean Square Error
2. Within Sum of Squares (WSS)
3. Speed Join:- https://t.me/AKTU_Notes_Books_Quantum
4. None of these
Answer: Within Sum of Squares (WSS)
23. If an itemset is considered frequent, then any subset of the frequent itemset must also be frequent.
1. Apriori Property
2. Downward Closure Property
3. Either 1 or 2
4. Both 1 and 2
Answer: Both 1 and 2Z
24. if {bread,eggs,milk} has a support of 0.15 and {bread,eggs} also has a support of 0.15, the confidence of rule {bread,eggs} = {milk} is
1.0
2.1
3.2
4.3
Answer: 1
25. Confidence is a measure of how X and Y are really related rather than coincidentally happeningtogether.
1. True Join:- https://t.me/AKTU_Notes_Books_Quantum
2. False
Answer: False
26. recommend items based on similarity measures between users and/or items.
1. Content Based Systems
2. Hybrid System
3. Collaborative Filtering Systems
4. None of these
Answer: Collaborative Filtering Systems
27. There are major Classification of Collaborative Filtering Mechanisms
1.1
2.2
3.3
4. none of above
Answer: 2
28. Movie Recommendation to people is an example of
1. User Based Recommendation
2. Item Based Recommendation
3. Knowledge Based Recommendation Join:- https://t.me/AKTU_Notes_Books_Quantum
4. content based recommendation
Answer: Item Based Recommendation
29. recommenders rely on an explicitely defined set of recommendation rules
1. Constraint Based
2. Case Based
3. Content Based
4. User Based
Answer: Case Based
30. Parallelized hybrid recommender systems operate dependently of one another and produce separate recommendation lists.
1. True
2. False
Answer: False
COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 1 DataAnalytics(KIT601)
1. The data with no pre-defined organizational form or specific format is
a. Semi-structured data b. Unstructured data c. Structured data d. None of these
Ans. b
a. Categorical data b. Interval data c. Ordinal data d. Ratio data
Ans. c
Ans. c
2. The data which can be ordered or ranked according to some relationship to one another is
3. Predict the future by examining historical data, detecting patterns or relationships in these data, and then extrapolating these relationships forward in time. a. Prescriptive model b. Descriptive model c. Predictive model d. None of these
Ans. b
Ans. a
Ans. c
Ans. d
4. Person responsible for the genesis of the project, providing the impetus for the project and core business problem, generally provides the funding and will gauge the degree of value from the final outputs of the working team is a. Business User b. Project Sponsor c. Business Intelligence Analyst d. Data Engineer
5. Deep technical skills to assist with tuning SQL queries for data management, extraction and support data ingest to analytic sandbox is handled by ___________. a. Data Engineer b. Business User c. Project Sponsor d. Business Intelligence Analyst
6. Business domain expertise with deep understanding of the data, KPIs, key metrics and business intelligence from a reporting perspective is key role of ____________.
a. Business User b. Project Sponsor c. Business Intelligence Analyst d. Data Engineer
7. _____________ is concerned with uncertainty or inaccuracy of the data.
a. Volume b. Velocity c. Variety d. Veracity
Ans. d
Ans. d
Ans. True
11. The process of exploring data and reports in order to extract meaningful insights, which can be used to better understand and improve business performance.
a. Reporting b. Analysis c. Summarizing d. None of these
Ans. b
Ans. a
8. What are the V’s in the characteristics of Big data? a. Volume b. Velocity c. Variety d. All of these
9. What are the types of reporting in data analytics?
a. Canned reports b. Dashboard reports c. Alert reports d. All of above
10.Massive Parallel Processing (MPP) database breaks the data into independent chunks with independent disk and CPU resources.
a. True b. False
12. The key components of an analytical sandbox are: (i) Business analytics (ii) Analytical sandbox platform (iii) Data access and delivery (iv) Data sources
a. True b. False
Ans. b
14. Which phase Prepare an analytic sandbox, in which you can work for the duration of the project. Perform ELT and ETL to get data into the sandbox, and begin transforming the data so you can work with it and analyze it. Familiarize yourself with the data thoroughly and take steps to condition the data.
a. Data preparation b. Discovery c. Data Modelling d. Data Building Ans. a
Ans.b
Ans. a
13. The ____________phase learn the business domain, including relevant history, such as whether the organization or business unit has attempted similar projects in the past, from which you can learn. Assess the resources you will have to support the project, in terms of people, technology, time, and data. Frame the business problem as an analytic challenge that can be addressed in subsequent phases. Formulate initial hypotheses (IH) to test and begin learning the data. a. Data preparation b. Discovery c. Data Modelling d. Data Building
15. Which phase uses SQL, Python, R, or excel to perform various data modifications and transformations.
a. Data preparation b. Data cleaning c. Data Modelling d. Data Building
16. By definition, Database Administrator is a person who ___________
a. Provisions and configures database environment to support the analytical needs of the working team. b. Ensure key milestones and objectives are met on time and at expected quality. c. Deep technical skills to assist with tuning SQL queries for data management, extraction and support data ingest to analytic sandbox. d. None of these
Ans. a
Ans. c
Ans. b
Ans .b
17. ETL stands for
a. Extract, Load, Transform b. Evaluate, Transform ,Load c. Extract , Loss , Transform d. None of the above
18. The phase Develop data sets for testing, training, and production purposes. Get the best environment you can for executing models and workflows, including fast hardware and parallel processing is referred to as
a. Data preparation b. Discovery c. Data Modelling d. Data Building
19. Which of the following is not a major data analysis approaches?
a. Data Mining b. Predictive Intelligence c. Business Intelligence d. Text Analytics
20. User rating given to a movie in a scale 1-10, can be considered as an attribute of type?
a. Nominal b. Ordinal c. Interval d. Ratio
Ans. d
22. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.
a. TRUE b. FALSE c. Can be true or false d. Cannot say
Ans. a
Ans. b
Ans.b
25. The Process of describing the data that is huge and complex to store and process is known as
a. Analytics b. Data mining c. Big Data d. Data Warehouse
21. Data Analysis is defined by the statistician?
a. William S. b. Hans Peter Luhn c. Gregory Piatetsky-Shapiro d. John Tukey
23. Which of the following is not a major data analysis approaches?
a. Data Mining b. Predictive Intelligence c. Business Intelligence d. Text Analytics
24. Which of the following step is performed by data scientist after acquiring the data?
a. Data Cleansing b. Data Integration c. Data Replication d. All of the mentioned
Ans. c
26. Data generated from online transactions is one of the example for volume of big data. Is this true or False. a. TRUE b. FALSE
Ans. a
27. Velocity is the speed at which the data is processed
a. TRUE b. FALSE
Ans. b
28. _____________ have a structure but cannot be stored in a database.
a. Structured b. Semi-Structured c. Unstructured d. None of these
Ans. b
29. ____________refers to the ability to turn your data useful for business.
a. Velocity b. Variety c. Value d. Volume
Ans. c
30. Value tells the trustworthiness of data in terms of quality and accuracy.
a. TRUE b. FALSE
Ans.b
NPTEL Questions
31. Analysing the data to answer why some phenomenon related to learning happened is a type of
a. Descriptive Analytics b. Diagnostic Analytics
c. Predictive Analytics d. Prescriptive Analytics
Ans. B
32. Analysing the data to answer what will happen next is a type of
a. Descriptive Analytics b. Diagnostic Analytics c. Predictive Analytics d. Prescriptive Analytics
Ans. D
33. Learning analytics at institutions/University, regional or national level is termed as
a. Educational data mining b. Business intelligence c. Academic analytics d. None of the above
Ans. C
34. Which of the following questions is not a type of Predictive Analytics?
a. What is the average score of all students in the CBSE 10th Maths Exam? b. What will be the performance of a students in next questions? c. Which courses will the student take in the next semester? d. What is the average attendance of the class over the semester
Ans A,D
35. A courses instructor has data about students attendance in her course in the past semester. Based on this data, she constructs a line graph type of analytics is she doing?
a. Descriptive Analytics b. Diagnostic Analytics c. Predictive Analytics d. Prescriptive Analytics
Ans. A
36. she then correlates the attendance with their final exam scores. She realizes that students who score 90% and above also have an attandence of more then 75%. What type of analytics is she doing?
a. Descriptive Analytics b. Diagnostic Analytics c. Predictive Analytics d. Prescriptive Analytics
Ans. B
38. Why one should not go for sampling?
a. Less costly to administer than a census. b. The person authorizing the study is comfortable with the sample. c. Because the research process is sometimes destructive d. None of the above
Ans. d
39. Stratified random sampling is a method of selecting a sample in which:
a. the sample is first divided into strata, and then random samples are taken from each stratum b. various strata are selected from the sample c. the population is first divided into strata, and then random samples are drawn from each stratum d. None of these alternatives is correct.
Ans. c
SET II
1. Data Analysis is defined by the statistician?
e. William S. f. Hans Peter Luhn g. Gregory Piatetsky-Shapiro h. John Tukey
Ans D
2. What is classification?
a) deciding what features to use in a pattern recognition problem b) deciding what class an input pattern belongs to c) deciding what type of neural network to use d) none of the mentioned
Ans. B
3. Data in ___________ bytes size is called Big Data.
A. Tera B. Giga C. Peta D. Meta
Ans : C
Explanation: data in Peta bytes i.e. 10^15 byte size is called Big Data. 4. How many V's of Big Data
A. 2 B. 3 C. 4 D. 5
Ans : D
Explanation: Big Data was defined by the “3Vs” but now there are “5Vs” of Big Data which are Volume, Velocity, Variety, Veracity, Value
5. Transaction data of the bank is?
A. structured data B. unstructured datat C. Both A and B D. None of the above
Ans : A
Explanation: Data which can be saved in tables are structured data like the transaction data of the bank. 6. In how many forms BigData could be found?
A. 2 B. 3 C. 4 D. 5
Ans : B
Explanation: BigData could be found in three forms: Structured, Unstructured and Semistructured. 7. Which of the following are Benefits of Big Data Processing?
A. Businesses can utilize outside intelligence while taking decisions B. Improved customer service C. Better operational efficiency D. All of the above
Ans : D
Explanation: All of the above are Benefits of Big Data Processing.
8. Which of the following are incorrect Big Data Technologies?
A. Apache Hadoop B. Apache Spark C. Apache Kafka D. Apache Pytarch
Ans : D
Explanation: Apache Pytarch is incorrect Big Data Technologies. 9. The overall percentage of the world’s total data has been created just within the past two years is ?
A. 80% B. 85% C. 90% D. 95%
Ans : C
Explanation: The overall percentage of the world’s total data has been created just within the past two years is 90%. 10. Apache Kafka is an open-source platform that was created by?
A. LinkedIn B. Facebook
C. Google D. IBM
Ans : A
Explanation: Apache Kafka is an open-source platform that was created by LinkedIn in the year 2011.
11. What was Hadoop named after?
A. Creator Doug Cutting’s favorite circus act B. Cuttings high school rock band C. The toy elephant of Cutting’s son D. A sound Cutting’s laptop made during Hadoop development
Ans : C
Explanation: Doug Cutting, Hadoop creator, named the framework after his child’s stuffed toy elephant. 12. What are the main components of Big Data?
A. MapReduce B. HDFS C. YARN D. All of the above
Ans : D
Explanation: All of the above are the main components of Big Data.
13. Point out the correct statement.
A. Hadoop do need specialized hardware to process the data B. Hadoop 2.0 allows live stream processing of real time data C. In Hadoop programming framework output files are divided into lines or records D. None of the above
Ans : B
Explanation: Hadoop batch processes data distributed over a number of computers ranging in 100s and 1000s. 14. Which of the following fields come under the umbrella of Big Data?
A. Black Box Data B. Power Grid Data
C. Search Engine Data D. All of the above
Ans : D
Explanation: All options are the fields come under the umbrella of Big Data.
15. Which of the following is not an example of Social Media? 1. Twitter 2. Google 3. Instagram 4. Youtube
ANs: 2 (Google)
16. By 2025, the volume of digital data will increase to 1. TB 2. YB 3. ZB 4. EB Ans: 3 ZB
17. Data Analysis is a process of 1. inspecting data 2. cleaning data 3. transforming data 4. All of Above
Ans. 4 All of above
18. Which of the following is not a major data analysis approaches? 1. Data Mining 2. Predictive Intelligence 3. Business Intelligence 4. Text Analytics
Ans. 2 Predictive Intelligence
19. The Process of describing the data that is huge and complex to store and process is known as 1. Analytics 2. Data mining 3. Big data 4. Data warehouse
Ans. 3 Big data
20. In descriptive statistics, data from the entire population or a sample is summarized with ?
1. Integer descriptor 2. floating descriptor 3. numerical descriptor 4. decimal descriptor
Ans. 3 numerical descriptor
21. Data generated from online transactions is one of the example for volume of big data 1. TRUE 2. FALSE
TRUE
22. Velocity is the speed at which the data is processed 1. True 2. False
False
23. Value tells the trustworthiness of data in terms of quality and accuracy 1. TRUE 2. FALSE
False
24. Hortonworks was introduced by Cloudera and owned by Yahoo 1. True 2. False
False
25. ____ refers to the ability to turn your data useful for business 1. Velocity 2. variety 3. Value 4. Volume
Ans. 3 Value
26. Data Analysis is defined by the statistician? 1. William S. 2. Hans Peter Luhn 3. Gregory Piatetsky-Shapiro 4. John Tukey
Ans. 4 John Tukey
27. Files are divided into ____ sized Chunks. 1. Static 2. Dynamic 3. Fixed 4. Variable
Ans. 3 Fixed
28. _____ is an open source framework for storing data and running application on clusters of commodity hardware. 1. HDFS 2. Hadoop 3. MapReduce 4. Cloud
Ans. 2 Hadoop
29. ____ is factors considered before Adopting Big Data Technology 1. Validation 2. Verification 3. Data 4. Design
Ans. 1 Validation
30. Which among the following is not a Data mining and analytical applications? 1. profile matching 2. social network analysis 3. facial recognition 4. Filtering
Ans. 4 Filtering
31. Which storage subsystem can support massive data volumes of increasing size. 1. Extensibility 2. Fault tolerance 3. Scalability 4. High-speed I/O capacity
Ans. 3 Scalability
32. ______ is a programming model for writing applications that can process Big Data in parallel on multiple nodes.
1. HDFS 2. MAP REDUCE 3. HADOOP 4. HIVE Ans. MAP REDUCE
33. How many main statistical methodologies are used in data analysis?
A. 2 B. 3 C. 4 D. 5
Ans : A
Explanation: In data analysis, two main statistical methodologies are used Descriptive statistics and Inferential statistics.
34. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.
A. TRUE B. FALSE C. Can be true or false D. Can not say
Ans : A
Explanation: The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.
35. The branch of statistics which deals with development of particular statistical methods is classified as 1. industry statistics 2. economic statistics 3. applied statistics 4. applied statistics
Ans. applied statistics
36. Point out the correct statement. a) Descriptive analysis is first kind of data analysis performed b) Descriptions can be generalized without statistical modelling
c) Description and Interpretation are same in descriptive analysis d) None of the mentioned
Answer: b Explanation: Descriptive analysis describe a set of data.
37. What are the five V’s of Big Data?
A. Volume
B. Velocity
C. Variety
D. All the above
Answer: Option D
38. What are the main components of Big Data?
A. MapReduce
B. HDFS
C. YARN
D. All of these
Answer: Option D
39. What are the different features of Big Data Analytics?
A. Open-Source
B. Scalability
C. Data Recovery
D. All the above
Answer: Option D
40. Which of the following refers to the problem of finding abstracted patterns (or structures) in the unlabeled data?
A. Supervised learning
B. Unsupervised learning
C. Hybrid learning
D. Reinforcement learning
Answer: B
Explanation: Unsupervised learning is a type of machine learning algorithm that is generally used to find the hidden structured and patterns in the given unlabeled data.
41. Which one of the following refers to querying the unstructured textual data?
A. Information access
B. Information update
C. Information retrieval
D. Information manipulation
Answer: D
Explanation: Information retrieval refers to querying the unstructured textual data. We can also understand information retrieval as an activity (or process) in which the tasks of obtaining information from system recourses that are relevant to the information required from the huge source of information.
42. For what purpose, the analysis tools pre-compute the summaries of the huge amount of data?
A. In order to maintain consistency
B. For authentication
C. For data access
D. To obtain the queries response
Answer: d
Explanation: Whenever a query is fired, the response of the query would be put very earlier. So, for the query response, the analysis tools pre-compute the summaries of the huge amount of data. To understand it in more details, consider the following example:
43. Which one of the following statements is not correct about the data cleaning?
It refers to the process of data cleaning
It refers to the transformation of wrong data into correct data
It refers to correcting inconsistent data
All of the above
Answer: d
Explanation: Data cleaning is a kind of process that is applied to data set to remove the noise from the data (or noisy data), inconsistent data from the given data. It also involves the process of transformation where wrong data is transformed into the correct data as well. In other words, we can also say that data cleaning is a kind of pre-process in which the given set of data is prepared for the data warehouse.
44. Any data with unknown form or the structure is classified as _ data. a. Structured b. Unstructured c. Semi-structured d. None of above Ans. b
45.____ means relating to the issuing of reports. a. Analysis b. Reporting c. Reporting and Analysis d. None of the above
Ans. b
46.Veracity involves the reliability of the data this is ________due to the numerous data sources of big data a) Easy and difficulty b) Easiness c) Demanding d) none of these
Ans. c 47. ____is a process of defining the measurement of a phenomenon that is not directly measurable, though its existence is implied by other phenomena. a. Data preparation b. Model planning c. Communicating results d. Operationalization
Ans. d
48. _____data is data whose elements are addressable for effective analysis.
a. Structured b. Semi-structured c. Unstructured d. None of the above
Ans. a
49. ______data is information that does not reside in a relational database but that have some organizational properties that make it easier to analyze.
a. Structured b. Semi-structured c. Unstructured d. None of the above
Ans. b
50. ______data is a data which is not organized in a predefined manner or does not have a predefined data model, thus it is not a good fit for a mainstream relational database.
a. Structured b. Semi-structured c. Unstructured d. None of the above
Ans. c
51. There are ___ types of big data.
a. 2 b. 3 c. 4 d. 5
Ans. b
52. Google search is an example of _________ data.
a. Structured b. Semi-structured c. Unstructured d. None of the above
Ans. c
KIETGroupofInstitutions
Department of IT
COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 2 DataAnalytics(KIT601)
1. Maximum aposteriori classifier is also known as: a. Decision tree classifier b. Bayes classifier c. Gaussian classifier d. Maximum margin classifier
Ans. B
2. Which of the following sentence is FALSE regarding regression?
a. It relates inputs to outputs. b. It is used for prediction. c. It may be used for interpretation. d. It discovers causal relationships.
Ans. d
3. Suppose you are working on stock market prediction, and you would like to predict the price of a particular stock tomorrow (measured in dollars).
You want to use a learning algorithm for this.
a. Regression b. Classification c. Clustering d. None of these
Ans. a
4. In binary logistic regression:
a. The dependent variable is divided into two equal subcategories. b. The dependent variable consists of two categories. c. There is no dependent variable. d. The dependent variable is continuous.
Ans. b
5. A fair six-sided die is rolled twice. What is the probability of getting 4 on the first roll and not getting 6 on the second roll?
a. 1/36 b. 5/36 c. 1/12 d. 1/9
Ans. b
6. The parameter β0 is termed as intercept term and the parameter β1 is termed as slope parameter. These parameters are usually called as _________
a Regressionists b. Coefficients c. Regressive d. Regression coefficients
Ans. d
7. ________ is a simple approach to supervised learning. It assumes that the dependence of Y on X1, X2… Xp is linear.
a. Gradient Descent b. Linear regression
c. Logistic regression d. Greedy algorithms
Ans. c
8. What makes the interpretation of conditional effects extra challenging in logistic regression?
a. It is not possible to model interaction effects in logistic regression b. The maximum likelihood estimation makes the results unstable c. The conditional effect is dependent on the values of all X-variables d. The results has to be raised by its natural logarithm.
Ans. c 9. If there were a perfect positive correlation between two interval/ratio variables, the Pearson's r test would give a correlation coefficient of:
a. - 0.328 b. +1 c. +0.328 d. – 1
Ans.b
10. Logistic Regression transforms the output probability to in a range of [0, 1]. Which of the following function is used for this purpose?
a. Sigmoid b. Mode c. Square d. All of these
Ans.a
12. Generally which of the following method(s) is used for predicting continuous dependent variable?
1. Linear Regression 2. Logistic Regression
a. 1 and 2
b. only 1 c. only 2 d. None of these
Ans.b
13. Mean of the set of numbers {1, 2, 3, 4, 5} is?
a. 2 b. 3 c. 4 d. 5
Ans.b
14. Name of a movie, can be considered as an attribute of type?
a. Nominal
b. Ordinal
c. Interval
d. Ratio
Ans.a
15. Let A be an example, and C be a class. The probability P(C) is known as:
a. Apriori probability
b. Aposteriori probability
c. Class conditional probability
d. None of the above
Ans.a
16. Consider two binary attributes X and Y. We know that the attributes are independent and probability P(X=1) = 0.6, and P(Y=0) = 0.4. What is the probability that both X and Y have values 1?
a. 0. 0.06 b. 0.16 c. 0.26 d. 0.36
Ans. d
17. In regression the output is a. Discrete b. Continuous c. Continuous and always lie in same range d. May be discrete and continuous
Ans. b
18. The probabilistic model that finds the most probable prediction using the training data and space of hypotheses to make a prediction for a new data instance.
a. Concept learning b. Bayes optimal classifier c. EM algorithm d. Logistic regression
Ans. b
19 . State whether the following condition is true or not. “In Bayesian theorem , it is important to find the probability of both the events occurring simultaneously”
a. True b. False
Ans. b 20 .If the correlation coefficient is a positive value, then the slope of the regression line
a. can be either negative or positive
b. must also be positive c. can be zero d. cannot be zero
Ans. b
21. Which of the following is true about Naive Bayes?
a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent c. Assumes that all the features in a dataset are equally important and are independent. d. None of the above options
Ans. c
22. Previous probabilities in Bayes Theorem that are changed with help of new available information are classified as _______
a. independent probabilities b. posterior probabilities c. interior probabilities d. dependent probabilities
Ans. b
23. Which of the following methods do we use, to find the best fit line for data in Linear Regression?
a. Least Square Error b. Maximum Likelihood c. Logarithmic Loss d. Both A and B
Ans. a
24. What is the consequence between a node and its predecessors while creating Bayesian network?
a. Conditionally dependent b. Dependent c. Conditionally independent d. Both a & b
Ans. c 25. Bayes rule can be used to __________conditioned on one piece of evidence.
a. Solve queries b. Answer probabilistic queries c. Decrease complexity of queries d. Increase complexity of queries
Ans.b
26. Which of the following options is/are correct in reference to Bayesian Learning?
a. New instances can be classified by combining the predictions of multiple hypotheses, weighted by their probabilities. b. Bayesian methods can accommodate hypotheses that make probabilistic predictions. c. Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. d. All of the mentioned
Ans. d
27. When the cell is said to be fired? a. if potential of body reaches a steady threshold values b. if there is impulse reaction c. during upbeat of heart d. none of the mentioned
Ans.a 28. Which of the following is true about regression analysis?
a. answering yes/no questions about the data b. estimating numerical characteristics of the data c. modeling relationships within the data d. describing associations within the data
Ans.c
29. Suppose you are building a SVM model on data X. The data X can be error prone which means that you should not trust any specific data point too much. Now think that you want to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of its hyper parameter. Based upon that give the answer for following question. What would happen when you use very large value of C?
a. We can still classify data correctly for given setting of hyper parameter C b. We cannot classify data correctly for given setting of hyper parameter C. c. Can’t Say
d. None of these
Ans. a
30. What is/are true about kernel in SVM?
(a) Kernel function map low dimensional data to high dimensional space (b) It’s a similarity function
a. Kernel function map low dimensional data to high dimensional space b. It’s a similarity function c. Kernel function map low dimensional data to high dimensional space and It’s a similarity function d. None of these
Ans. c
31. Suppose you have trained an SVM with linear decision boundary after training SVM, you correctly infer that your SVM model is under fitting. (a) Which of the following option would you more likely to consider iterating SVM next time? Tasks a. You want to increase your data points. b. You want to decrease your data points. c. You will try to calculate more variables. d. You will try to reduce the features.
Ans. c
32. Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?
a. The model would consider even far away points from hyperplane for modeling b. The model would consider only the points close to the hyperplane for modeling. c. The model would not be affected by distance of points from hyperplane for modeling. d. None of these
Ans.b
33. Which of the following can only be used when training data are linearly separable?
a. Linear Logistic Regression. b. Linear Soft margin SVM c. Linear hard-margin SVM d. Parzen windows.
Ans.c
34. Using the kernel trick, one can get non-linear decision boundaries using algorithms designed originally for linear models.
a. True b. False
Ans. a
35. Support vectors are the data points that lie closest to the decision surface.
a. True b. False
Ans. True
36. Which of the following statement is true for a multilayered perceptron?
a. Output of all the nodes of a layer is input to all the nodes of the next layer b. Output of all the nodes of a layer is input to all the nodes of the same layer c. Output of all the nodes of a layer is input to all the nodes of the previous layer d. Output of all the nodes of a layer is input to all the nodes of the output layer
Ans. a
37. Which of the following is/are true regarding an SVM?
a. For two dimensional data points, the separating hyperplane learnt by a linear SVM will be a straight line. b. In theory, a Gaussian kernel SVM cannot model any complex separating hyperplane. c. For every kernel function used in a SVM, one can obtain an equivalent closed form basis expansion. d. Overfitting in an SVM is not a function of number of support vectors.
Ans. a
38. The function of distance that is used to determine the weight of each training example in instance based learning is known as______________
a. Kernel Function b. Linear Function c. Binomial distribution d. All of the above
Ans. a 39. What is the name of the function in the following statement “A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it outputs a 1, otherwise it just outputs a 0”?
a. Step function b. Heaviside function c. Logistic function d. Binary function
Ans. b
40. Which of the following is true? (i) On average, neural networks have higher computational rates than conventional computers. (ii) Neural networks learn by example. (iii) Neural networks mimic the way the human brain works.
a. All of the mentioned are true b. (ii) and (iii) are true c. (i) and (ii) are true d. Only (i) is true
Ans. a
41. Which of the following is an application of NN (Neural Network)?
a. Sales forecasting b. Data validation c. Risk management d. All of the mentioned
Ans. d
42. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it outputs a 1, otherwise it just outputs a 0.
a. True b. False
Ans. a
43. In what ways can output be determined from activation value in ANN?
a. Deterministically
b. Stochastically c. both deterministically & stochastically d. none of the mentioned
Ans. c
45. In ANN, the amount of output of one unit received by another unit depends on what?
a. output unit b. input unit c. activation value d. weight
Ans. d
46. Function of dendrites in ANN is
a. receptors b. transmitter c. both receptor & transmitter d. none of the mentioned
Ans. a
47. Which of the following is true? (i) On average, neural networks have higher computational rates than conventional computers. (ii) Neural networks learn by example. (iii) Neural networks mimic the way the human brain works.
a. All of the mentioned are true b. (ii) and (iii) are true c. (i), (ii) and (iii) are true d. Only (i) is true
Ans. a 48. What is the name of the function in the following statement “A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it outputs a 1, otherwise it just outputs a 0”?
a. Step function b. Heaviside function
c. Logistic function d. Binary function
Ans. b
49. 4 input neuron has weight 1, 2, 3 and 4. The transfer function is linear with the constant of proportionality being equal to 2. The inputs are 4,10,5 and 20 respectively. The output will be
a. 238 b. 76 c. 119 d. 123
Ans. a
50. Which of the following are real world applications of the SVM?
a. Text and Hypertext Categorization b. Image Classification c. Clustering of News Articles d. All of the above
Ans.d
51. Support vector machine may be termed as:
a. Maximum apriori classifier
b. Maximum margin classifier
c. Minimum apriori classifier
d. Minimum margin classifier
Ans.b
52. What is purpose of Axon? a. receptors b. transmitter c. transmission d. none of the mentioned
53. The model developed from sample data having the form of Å· = b0 + b1X is known as: Ans: - C – estimated regression equation
54. In regression analysis, which of the following is not a required assumption about the error term ε?
Ans: - A – The expected value of the error term is one
55. ____________ are algorithms that learn from their more complex environments (hence eco) to generalize, approximate and simplify solution logic.
a. Fuzzy Relational DB
b. Ecorithms
c. Fuzzy Set
d. None of the mentioned
Ans. c
56. The truth values of traditional set theory is ____________ and that of fuzzy set is __________
a. Either 0 or 1, between 0 & 1
b. Between 0 & 1, either 0 or 1
c. Between 0 & 1, between 0 & 1
d. Either 0 or 1, either 0 or 1
Ans. a
57. What is the form of Fuzzy logic?
a. Two-valued logic
b. Crisp set logic
Ans.c
c. Many-valued logic
d. Binary set logic
Ans. c
58. Fuzzy logic is usually represented as ___________
a. IF-THEN rules
b. IF-THEN-ELSE rules
c. Both IF-THEN-ELSE rules & IF-THEN rules
d. None of the mentioned
Ans. a
59. ______________ is/are the way/s to represent uncertainty.
a. Fuzzy Logic
b. Probability
c. Entropy
d. All of the mentioned
Ans.d
60. Fuzzy Set theory defines fuzzy operators. Choose the fuzzy operators from the following.
a. AND
b. OR
c. NOT
d. All of mentioned
Ans. d
61. The values of the set membership is represented by ___________
a. Discrete Set
b. Degree of truth
c. Probabilities
d. Both Degree of truth & Probabilities View Answer
Ans. b
62. Fuzzy logic is extension of Crisp set with an extension of handling the concept of Partial Truth.
a. True
b. False
Ans. a
SET II
1. Sentiment Analysis is an example of 1. Regression 2. Classification 3. clustering 4. Reinforcement Learning
1. 1, 2 and 4 2. 1, 2 and 3 3. 1 and 3 4. 1 and 2 Show Answer Ans. 1, 2 and 4
2. The self-organizing maps can also be considered as the instance of _________ type of learning.
A. Supervised learning B. Unsupervised learning C. Missing data imputation D. Both A & C
Answer: B Explanation: The Self Organizing Map (SOM), or the Self Organizing Feature Map is a kind of Artificial Neural Network which is trained through unsupervised learning.
3. The following given statement can be considered as the examples of_________
Suppose one wants to predict the number of newborns according to the size of storks' population by performing supervised learning
A. Structural equation modeling B. Clustering C. Regression D. Classification
Answer: C
Explanation: The above-given statement can be considered as an example of regression. Therefore the correct answer is C.
4. In the example predicting the number of newborns, the final number of total newborns can be considered as the _________
A. Features B. Observation C. Attribute
D. Outcome
a. Answer: d b. Explanation: In the example of predicting the total number of newborns, the result will be represented as the outcome. Therefore, the total number of newborns will be found in the outcome or addressed by the outcome.
5. Which of the following statement is true about the classification?
A. It is a measure of accuracy B. It is a subdivision of a set C. It is the task of assigning a classification D. None of the above
Answer: B
Explanation: The term "classification" refers to the classification of the given data into certain sub-classes or groups according to their similarities or on the basis of the specific given set of rules.
6. Which one of the following correctly refers to the task of the classification?
A. A measure of the accuracy, of the classification of a concept that is given by a certain theory B. The task of assigning a classification to a set of examples C. A subdivision of a set of examples into a number of classes D. None of the above
Answer: B
Explanation: The task of classification refers to dividing the set into subsets or in the numbers of the classes. Therefore the correct answer is C.
7. _____is an observation which contains either very low value or very high value in comparison to other observed values. It may hamper the result, so it should be avoided. a. Dependent Variable b. Independent Variable c. Outlier Variable d. None of the above Ans. c
8. _______is a type of regression which models the non-linear dataset using a linear model.
a. Polynomial Regression b. Logistic Regression c. Linear Regression d. Decision Tree Regression
Ans. a
9. The prediction of the weight of a person when his height is known, is a simple example of regression. The function used in R language is_____.
a. Im() b. print() c. predict() d. summary( )
Ans. c
10. There is the following syntax of lm() function in multiple regression.
lm(y ~ x1+x2+x3...., data) a. y is predictor and x1,x2,x3 are the dependent variables. b. y is dependent and x1,x2,x3 are the predictors. c. data is predictor variable. d. None of the above.
Ans. b
11. _______is a probabilistic graphical model which represents a set of variables and their conditional dependencies using a directed acyclic graph.
a. A Bayesian network b. Bayes Network c. Bayesian Model d. All of the above
Ans. d
12. In support vector regression, _____is a function used to map lower dimensional data into higher dimensional data
A) Boundary line B) Kernel C) Hyper Plane D) Support Vector Ans. B
13. If the independent variables are highly correlated with each other than other variables then such condition is called___________ a) outlier b) Multicollinearity c) under fitting d) independent variable
Ans. b
14. The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a ____ or_____.
a. Directed Acyclic Graph or DAG b. Directed Cyclic Graph or DCG. c. Both the above. d. None of the above.
Ans. a
15. The hyperplane with maximum margin is called the ______ hyperplane. a. Non-optimal b. Optimal c. None of the above d. Requires one more option
Ans. b
16. One more _____ is needed for non-linear SVM.
a. Dimension b. Attribute c. Both the above d. None of the above
Ans. a
17. A subset of dataset to train the machine learning model, and we already know the output.
a. Training set b. Test set c. Both the above
d. None of the above
Ans. a
18. ______is the final step of data preprocessing in machine learning. It is a technique to standardize the independent variables of the dataset in a specific range. In_____, we put our variables in the same range and in the same scale so that no any variable dominate the other variable
a. Feature Sampling b. Feature Scaling c. None of the above d. Both the above
Ans. b
19. Principal components analysis (PCA) is a statistical technique that allows identifying underlying linear patterns in a data set so it can be expressed in terms of other data set of a significantly ____ dimension without much loss of information. a. Lower b. Higher c. Equal d. None of the above
Ans. a
20. _____ units which are internal to the network and do not directly interact with the environment. a. Input b. Output c. Hidden d. None of the above
Ans. c
21. In a ____ network there is an ordering imposed on the nodes in a network: if there is a connection from unit a to unit b then there can-not be a connection from b to a. a. Feedback b. Feed-Forward c. None of the above
Ans. b
22. _____ contains the multiple logical values and these values are the truth values of a variable or problem between 0 and 1. This concept was introduced by Lofti Zadeh in 1965 a. Boolean Logic b. Fuzzy Logic c. None of the above
Ans. b
23. ______is a module or component, which takes the fuzzy set inputs generated by the Inference Engine, and then transforms them into a crisp value. a. Fuzzification b. Defuzzification c. Inference Engine d. None of the above
Ans. b
24. The most common application of time series analysis is forecasting future values of a numeric value using the ______ structure of the ____ a. Shares,data b. Temporal,data c. Permanent,data d. None of these
Ans. b
25. Identify the component of a time series a. Temporal b. Shares c. Trend d. Policymakers
Ans. c
26. Predictable pattern that recurs or repeats over regular intervals. Seasonality is often observed within a year or less: This define the term__________ a. Trend b. Seasonality c. Cycles d. Recession
Ans. b
27. ________Learning uses a training set that consists of a set of pattern pairs: an input pattern and the corresponding desired (or target) output pattern. The desired output may be regarded as the ‘network’s ‘teacher” for that input a. Unsupervised b. Supervised c. Modular d. Object
Ans. b
28. The _______ perceptron consists of a set of input units connected by a single layer of weights to a set of output units a. Multi layer b. Single layer c. Hidden layer d. None of these
Ans. b
29. If we add another layer of weights to single layer perceptron , then we find that there is a new set of units that are neither input or output units; for simplicity we consider more than 2 layers has a. Single layer perceptron b. Multi layer perceptron c. Hidden layer d. None of these
Ans. b
30. Patterns that repeat over a certain period of time a. Seasonal b. Trend c. None of the above d. Both of the above
Ans. a
31. Which of the following is characteristic of best machine learning method ?
a. Fast b. Accuracy c. Scalable d. All of the Mentioned
Ans. d
32. Supervised learning differs from unsupervised clustering in that supervised learning requires a. at least one input attribute. b. input attributes to be categorical. c. at least one output attribute. d. ouput attriubutes to be categorical. Ans. d
33. Supervised learning and unsupervised clustering both require at least one a. hidden attribute. b. output attribute. c. input attribute. d. categorical attribute. Ans. c
34. Which statement is true about prediction problems? a. The output attribute must be categorical. b. The output attribute must be numeric. c. The resultant model is designed to determine future outcomes. d. The resultant model is designed to classify current behavior. Ans. c
35. Which statement is true about neural network and linear regression models? a. Both models require input attributes to be numeric. b. Both models require numeric attributes to range between 0 and 1. c. The output of both models is a categorical attribute value. d. Both techniques build models whose output is determined by a linear sum of weighted input attribute values. Ans. a
36. A feed-forward neural network is said to be fully connected when a. all nodes are connected to each other. b. all nodes at the same layer are connected to each other. c. all nodes at one layer are connected to all nodes in the next higher layer. d. all hidden layer nodes are connected to all output layer nodes. Ans. c
37. Machine learning techniques differ from statistical techniques in that machine learning methods a. typically assume an underlying distribution for the data.
b. are better able to deal with missing and noisy data. c. are not able to explain their behavior. d. have trouble with large-sized datasets. Ans. b
38. This supervised learning technique can process both numeric and categorical input attributes. a. linear regression b. Bayes classifier c. logistic regression d. backpropagation learning Ans. b
39. This technique associates a conditional probability value with each data instance. a. linear regression b. logistic regression c. simple regression d. multiple linear regression Ans. b
40. Logistic regression is a ________ regression technique that is used to model data having a _____outcome. a. linear, numeric b. linear, binary c. nonlinear, numeric d. nonlinear, binary Ans. d
41. Which of the following problems is best solved using time-series analysis? a. Predict whether someone is a likely candidate for having a stroke. b. Determine if an individual should be given an unsecured loan. c. Develop a profile of a star athlete. d. Determine the likelihood that someone will terminate their cell phone contract.
Ans. d
42. Which of the following is true about Naive Bayes? a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent
c. Both A and B d. None of the above options Ans. c 43. Simple regression assumes a __________ relationship between the input attribute and output attribute. a. linear b. quadratic c. reciprocal d. inverse
44. With Bayes classifier, missing data items are a. treated as equal compares. b. treated as unequal compares. c. replaced with a default value. d. ignored. 45. What is Machine learning? a. The autonomous acquisition of knowledge through the use of computer programs b. The autonomous acquisition of knowledge through the use of manual programs c. The selective acquisition of knowledge through the use of computer programs d. The selective acquisition of knowledge through the use of manual programs
Ans: a
46. Automated vehicle is an example of ______ a. Supervised learning b. Unsupervised learning c. Active learning d. Reinforcement learning
Ans: a
47. Multilayer perceptron network is a. Usually, the weights are initially set to small random values b. A hard-limiting activation function is often used c. The weights can only be updated after all the training vectors have been presented d. Multiple layers of neurons allow for less complex decision boundaries than a single layer
Ans: a
48. Neural networks a. optimize a convex cost function b. cannot be used for regression as well as classification c. always output values between 0 and 1 d. can be used in an ensemble
Ans: d
49. In neural networks, nonlinear activation functions such as sigmoid, tanh, and ReLU a. speed up the gradient calculation in backpropagation, as compared to linear units b. are applied only to the output units c. help to learn nonlinear decision boundaries d. always output values between 0 and 1
Ans: c
50. Which of the following is a disadvantage of decision trees?
a. Factor analysis b. Decision trees are robust to outliers c. Decision trees are prone to be overfit d. None of the above
Ans: c
51. Back propagation is a learning technique that adjusts weights in the neural network by propagating weight changes. a. Forward from source to sink b. Backward from sink to source c. Forward from source to hidden nodes d. Backward from sink to hidden nodes
Ans: b
52. Identify the following activation function : φ(V) = Z + (1/ 1 + exp (– x * V + Y) ),Z, X, Y are parameters
a. Step function b. Ramp function c. Sigmoid function
d. Gaussian function
Ans: c
53. An artificial neuron receives n inputs x1, x2, x3............xnwith weights w1, w2, ..........wn attached to the input links. The weighted sum_________________ is computed to be passed on to a non-linear filter Φ called activation function to release the output. a. Σ wi b. Σ xi c. Σ wi + Σ xi d. Σ wi* xi
Ans: d
54. With Bayes classifier, missing data items are a. treated as equal compares. b. treated as unequal compares. c. replaced with a default value. d. ignored.
Ans:b
55. Machine learning techniques differ from statistical techniques in that machine learning methods a. typically assume an underlying distribution for the data. b. are better able to deal with missing and noisy data. c. are not able to explain their behavior. d. have trouble with large-sized datasets.
Ans: b
56. Which of the following is true about Naive Bayes?
a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent c. Both a and b d. None of the above options
Ans: c
57. How many terms are required for building a Bayes model?
a. 1 b. 2 c. 3 d. 4
Ans: c
58. What does the Bayesian network provides? a. Complete description of the domain b. Partial description of the domain c. Complete description of the problem d. None of the mentioned
Ans: a
59. How the Bayesian network can be used to answer any query? a. Full distribution b. Joint distribution c. Partial distribution d. All of the mentioned
Ans: b
60. In which of the following learning the teacher returns reward and punishment to learner? a. Active learning b. Reinforcement learning c. Supervised learning d. Unsupervised learning
Ans: b
61. Which of the following is the model used for learning? a. Decision trees b. Neural networks c. Propositional and FOL rules d. All of the mentioned
Ans: d
KIETGroupofInstitutions
Department of IT
COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 3 DataAnalytics(KIT601)
Q.1 Which attribute is _not_ indicative for data streaming?
A) Limited amount of memory
B) Limited amount of processing time
C) Limited amount of input data
D) Limited amount of processing power
A
Q.2 Which of the following statements about data streaming is true?
A) Stream data is always unstructured data.
B) Stream data often has a high velocity.
C) Stream elements cannot be stored on disk.
D) Stream data is always structured data.
Ans. B
Q.3 What is the main difference between standard reservoir sampling and min-wise sampling?
A) Reservoir sampling makes use of randomly generated numbers whereas minwise sampling does not.
B) Min-wise sampling makes use of randomly generated numbers whereas reservoir sampling does not.
C) Reservoir sampling requires a stream to be processed sequentially, whereas minwise does not.
D) For larger streams, reservoir sampling creates more accurate samples than minwise sampling.
Ans. C)
Q.4 A Bloom filter guarantees no
A) false positives
B) false negatives
C) false positives and false negatives
D) false positives or false negatives, depending on the Bloom filter type
Ans. B)
Q,5 Which of the following statements about standard Bloom filters is correct?
A) It is possible to delete an element from a Bloom filter.
B) A Bloom filter always returns the correct result.
C) It is possible to alter the hash functions of a full Bloom filter to create more
space.
Incorrect.
D) A Bloom filter always returns TRUE when testing for a previously added
element.
Ans. D)
Q.6 The DGIM algorithm was developed to estimate the counts of 1's occur within the last k bits of a stream window N. Which of the following statements is true about the estimate of the number of 0's based on DGIM?
A) The number of 0's cannot be estimated at all.
B) The number of 0's can be estimated with a maximum guaranteed error.
C) To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's.
D) None of these
Ans. B)
Q.7 Which of the following statements about the standard DGIM algorithm are false?
A)DGIM operates on a time-based window.
B) DGIM reduces memory consumption through a clever way of storing counts.
C) In DGIM, the size of a bucket is always a power of two.
D) The maximum number of buckets has to be chosen beforehand.
Ans. D)
Q.8 Which of the following statements about the standard DGIM algorithm are false?
A)DGIM operates on a time-based window.
B) DGIM reduces memory consumption through a clever way of storing counts.
C) In DGIM, the size of a bucket is always a power of two.
D) The buckets contain the count of 1's and each 1's specific position in the stream
Ans. D)
Q.9 What are DGIM’s maximum error boundaries? A) DGIM always underestimates the true count; at most by 25%
B) DGIM either underestimates or overestimates the true count; at most by 50%
C) DGIM always overestimates the count; at most by 50%
D) DGIM either underestimates or overestimates the true count; at most by 25%
Ans. B)
Q.10 Which algorithm should be used to approximate the number of distinct elements in a data stream?
A) Misra-Gries
B) Alon-Matias-Szegedy
C) DGIM
D) None of the above
Ans. D)
Q.11 Which algorithm should be used to approximate the number of distinct elements in a data stream?
A) Misra-Gries
B) Alon-Matias-Szegedy
C) DGIM
D) Flajolet and Martin
Ans. D)
Q.12 Which of the following streaming windows show valid bucket representations according to the DGIM rules?
A) 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 1
B) 1 0 1 1 1 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 1
C) 1 1 1 1 0 0 1 1 1 0 1 0 1
D) 1 0 1 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1
Ans. D)
Q.13 For which of the following streams is the second-order moment F2 greater than 45?
A) 10 5 5 10 10 10 1 1 1 10
B) 10 10 10 10 10 5 5 5 5 5
C) 1 1 1 1 1 5 10 10 5 1
D) None of these
Ans. B)
Q.14 For which of the following streams is the second-order moment F2 greater than 45?
A) 10 5 5 10 10 10 1 1 1 10
B) 10 10 10 10 10 10 10 10 10 10
C) 1 1 1 1 1 5 10 10 5 1
D) None of these
Ans. B)
Q 15 : In Bloom filter an array of n bits is initialized with
A) all 0s
B) all 1s
C) half 0s and half 1s
D) all -1
Ans. A)
Q 16. Pick a hash function h that maps each of the N elements to at least log2 N bits, Estimated number of distinct elements is
A) 2^R
B) 2^(-R)
C) 1-(2^R)
D) 1-(2^(-R))
Ans. A)
Q.17 Sliding window operations typically fall in the category
A) OLTP Transactions
B) Big Data Batch Processing
C) Big Data Real Time Processing
D) Small Batch Processing
Ans. C)
Q.18 What is the finally produced by Hierarchical Agglomerative Clustering?
A) final estimate of cluster centroids
B)assignment of each point to clusters
C) tree showing how close things are to each other
D) Group of clusters
Ans. C)
Q19 Which of the algorithm can be used for counting 1's in a stream
A) FM Algorithm
B) PCY Algorithm
C) DGIM Algorithm
D) SON Algorithm
Ans. C)
Q20 Which technique is used to filter unnecessary itemset in PCY algorithm
A) Association Rule
B) Hashing Technique
C) Data Mining
D) Market basket
Ans. B)
Q21 In association rule, which of the following indicates the measure of how frequently the items occur in a dataset ?
A) Support B) Confidence C) Basket D) Itemset
Ans. A)
Q.22 which of the following clustering technique is used by K- Means Algorithm
A) Hierarchical Technique
B) Partitional technique
C)Divisive
D) Agglomerative
Ans. B)
Q.23 which of the following clustering technique is used by Agglomerative Nesting Algorithm
A) Hierarchical Technique
B) Partitional technique
C) Density based
D)None of these
Q24. Which of the following Hierarchichal approach begins with each observation in a distinct (singleton) cluster, and successively merges clusters together until a stopping criterion is satisfied.
A) Divisive
B) Agglomerative
C) Single Link
D) Complete Link
Q.25 Park, Chen, Yu algorithm is useful for __________in Big Data Application.
A) Find Frequent Itemset
B) Filtering Stream
C) Distinct Element Find
D) None of these
Ans. A)
Q.26 .Match the following
a) Bloom filter i) Frequent Pattern Mining
b) FM Algorithm ii) Filtering Stream
c) PCY Algorithm iii) Distinct Element Find d) DGIM Algorithm iv) Counting 1’s in window A a)-ii), b-iii), c-i), d-iv)
B a)-iii), b-ii), c-i), d-iv)
C) A a)-i1), b-iii), c-ii), d-iv)
D) None of these
Ans. A)
SET II
1. Which of the following can be considered as the correct process of Data Mining? a. Infrastructure, Exploration, Analysis, Interpretation, Exploitation b. Exploration, Infrastructure, Analysis, Interpretation, Exploitation c. Exploration, Infrastructure, Interpretation, Analysis, Exploitation d. Exploration, Infrastructure, Analysis, Exploitation, Interpretation
Answer: a
Explanation: The process of data mining contains many sub-processes in a specific order. The correct order in which all sub-processes of data mining executes is Infrastructure, Exploration, Analysis, Interpretation, and Exploitation.
2. Which of the following is an essential process in which the intelligent methods are applied to extract data patterns? a. Warehousing b. Data Mining c. Text Mining d. Data Selection
Answer: b
Explanation: Data mining is a type of process in which several intelligent methods are used to extract meaningful data from the huge collection (or set) of data.
3. What are the functions of Data Mining? a. Association and correctional analysis classification b. Prediction and characterization
c. Cluster analysis and Evolution analysis d. All of the above
Answer: d
Explanation: In data mining, there are several functionalities used for performing the different types of tasks. The common functionalities used in data mining are cluster analysis, prediction, characterization, and evolution. Still, the association and correctional analysis classification are also one of the important functionalities of data mining.
4. Which attribute is _not_ indicative for data streaming?
a. Limited amount of memory b. Limited amount of processing time c. Limited amount of input data d. Limited amount of processing power
Ans. c
5. Which of the following statements about data streaming is true?
a. Stream data is always unstructured data. b. Stream data often has a high velocity. c. Stream elements cannot be stored on disk. d. Stream data is always structured data.
Ans. b
6. Which of the following statements about sampling are correct? a. Sampling reduces the amount of data fed to a subsequent data mining algorithm b. Sampling reduces the diversity of the data stream c. Sampling increases the amount of data fed to a data mining algorithm d. Sampling algorithms often need multiple passes over the data
Ans. a
7. Which of the following statements about sampling are correct? a. Sampling reduces the diversity of the data stream
b. Sampling increases the amount of data fed to a data mining algorithm c. Sampling algorithms often need multiple passes over the data d. Sampling aims to keep statistical properties of the data intact
Ans. d
8. What is the main difference between standard reservoir sampling and min-wise sampling?
a. Reservoir sampling makes use of randomly generated numbers whereas min-wise sampling does not. b. Min-wise sampling makes use of randomly generated numbers whereas reservoir sampling does not. c. Reservoir sampling requires a stream to be processed sequentially, whereas min-wise does not. d. For larger streams, reservoir sampling creates more accurate samples than min-wise sampling.
Ans. c
9. A Bloom filter guarantees no
a. false positives b. false negatives c. false positives and false negatives d. false positives or false negatives, depending on the Bloom filter type
Ans. b
10. Which of the following statements about standard Bloom filters is correct?
a. It is possible to delete an element from a Bloom filter. b. A Bloom filter always returns the correct result. c. It is possible to alter the hash functions of a full Bloom filter to create more space. d. A Bloom filter always returns TRUE when testing for a previously added element.
Ans. d
11. The FM-sketch algorithm uses the number of zeros the binary hash value ends in to make an estimation. Which of the following statements is true about the hash tail?
a. Any specific bit pattern is equally suitable to be used as hash tail.
b. Only bit patterns with more 0's than 1's are equally suitable to be used as hash tails. c. Only the bit patterns 0000000..00 (list of 0s) or 111111..11 (list of 1s) are suitable hash tails. d. Only the bit pattern 0000000..00 (list of 0s) is a suitable hash tail.
Ans. a
12. The FM-sketch algorithm can be used to:
a. Estimate the number of distinct elements. b. Sample data with a time-sensitive window. c. Estimate the frequent elements. d. Determine whether an element has already occurred in previous stream data.
Ans. a
13. The DGIM algorithm was developed to estimate the counts of 1's occur within the last kk bits of a stream window NN. Which of the following statements is true about the estimate of the number of 0's based on DGIM?
a. The number of 0's cannot be estimated at all. b. The number of 0's can be estimated with a maximum guaranteed error. c. To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's. d. None of above
Ans. b
14. Which of the following statements about the standard DGIM algorithm are false? a. DGIM operates on a time-based window b. DGIM reduces memory consumption through a clever way of storing counts c. In DGIM, the size of a bucket is always a power of two d. The maximum number of buckets has to be chosen beforehand. Ans. d
15. Which of the following statements about the standard DGIM algorithm are false? a. DGIM operates on a time-based window b. The buckets contain the count of 1's and each 1's specific position in the stream c. DGIM reduces memory consumption through a clever way of storing counts
d. In DGIM, the size of a bucket is always a power of two Ans. b
16. What are DGIM’s maximum error boundaries?
a. DGIM always underestimates the true count; at most by 25% b. DGIM either underestimates or overestimates the true count; at most by 50% c. DGIM always overestimates the count; at most by 50% d. DGIM either underestimates or overestimates the true count; at most by 25%
Ans. b
17. Which algorithm should be used to approximate the number of distinct elements in a data stream?
a. Misra-Gries b. Alon-Matias-Szegedy c. DGIM d. None of the above
Ans. d
18. Which of the following statements about Bloom filters are correct?
a. A Bloom filter has the same properties as a standard Hashmap data structure in Java (java.util.HashMap). b. A Bloom filter is full if no more hash functions can be added to it. c. A Bloom filter always returns FALSE when testing for an element that was not previously added d. A Bloom filter always returns TRUE when testing for a previously added element
Ans. d
19. Which of the following statements about Bloom filters are correct?
a. An empty Bloom filter (no elements added to it) will always return FALSE when testing for an element b. A Bloom filter has the same properties as a standard Hashmap data structure in Java (java.util.HashMap). c. A Bloom filter is full if no more hash functions can be added to it.
d. A Bloom filter always returns FALSE when testing for an element that was not previously added Ans. a
20. Which of the following streaming windows show valid bucket representations according to the DGIM rules?
a. 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 1 b. 1 0 1 1 1 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 1 c. 1 1 1 1 0 0 1 1 1 0 1 0 1 d. 1 0 1 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1
Ans. d
For which of the following streams is the second-order moment F2F2 greater than 45? 10 5 5 10 10 10 1 1 1 10 ✗ 10 10 10 10 10 5 5 5 5 5 ✗ This option is correct. 1 1 1 1 1 5 10 10 5 1 ✓ 10 10 10 10 10 10 10 10 10 10 ✗ This option is correct.
What is the space complexity of the FREQUENT algorithm? Recall that it aims to find all elements in a sequence whose frequency exceeds 1k1k of the total count. In the equations below, nn is the maximum value of each key and mm is the maximum value of each counter.
a. O(k(logm+logn))
Correct!
b. o(k(logm+logn)) c. O(logk(m+n)) d. o(logk(m+n))
Suppose that to get some information about something, you write a keyword in Google search. Google's analytical tools will then pre-compute large amounts of data to provide a quick output related to the keywords you have written.
19) Which of the following statements is correct about data mining?
a. It can be referred to as the procedure of mining knowledge from data b. Data mining can be defined as the procedure of extracting information from a set of the data c. The procedure of data mining also involves several other processes like data cleaning, data transformation, and data integration d. All of the above
Answer: d
Explanation: The term data mining can be defined as the process of extracting information from the massive collection of data. In other words, we can also say that data mining is the procedure of mining useful knowledge from a huge set of data.
25) The classification of the data mining system involves:
a. Database technology b. Information Science c. Machine learning d. All of the above
Answer: d
Explanation: Generally, the classification of a data mining system depends on the following criteria: Database technology, machine learning, visualization, information science, and several other disciplines.
27) The issues like efficiency, scalability of data mining algorithms comes under_______
a. Performance issues b. Diverse data type issues c. Mining methodology and user interaction d. All of the above
Answer: a
Explanation: In order to extract information effectively from a huge collection of data in databases, the data mining algorithm must be efficient and scalable. Therefore the correct answer is A.
KIETGroupofInstitutions
Department of IT
COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 4 DataAnalytics(KIT601)
1. What does Apriori algorithm do? a. It mines all frequent patterns through pruning rules with lesser support b. It mines all frequent patterns through pruning rules with higher support c. Both a and b d. None of these
Ans. a
2. What techniques can be used to improve the efficiency of apriori algorithm? a. hash based techniques b. transaction reduction c. Partitioning d. All of these
Ans.d 3. What do you mean by support (A)?
a. Total number of transactions containing A b. Total Number of transactions not containing A c. Number of transactions containing A / Total number of transactions d. Number of transactions not containing A / Total number of transactions
Ans. c 4. Which of the following is direct application of frequent itemset mining? a. Social Network Analysis b. Market Basket Analysis c. outlier detection
d. intrusion detection
Ans. b 5. When do you consider an association rule interesting? a. If it only satisfies min_support b. If it only satisfies min_confidence c. If it satisfies both min_support and min_confidence d. There are other measures to check so
Ans. c
6. What is the difference between absolute and relative support? a. Absolute -Minimum support count threshold and Relative-Minimum support threshold b. Absolute-Minimum support threshold and Relative-Minimum support count threshold c. Both a and b d. None of these
Ans. a
7. What is the relation between candidate and frequent itemsets?
a. A candidate itemset is always a frequent itemset b. A frequent itemset must be a candidate itemset c. No relation between the two d. None of these
Ans. b
8. What is the principle on which Apriori algorithm work?
a. If a rule is infrequent, its specialized rules are also infrequent b. If a rule is infrequent, its generalized rules are also infrequent c. Both a and b d. None of these
Ans. a
9. Which of these is not a frequent pattern mining algorithm a. Apriori b. FP growth c. Decision trees d. Eclat
Ans. c
10. What are closed frequent itemsets?
a. A closed itemset b. A frequent itemset c. An itemset which is both closed and frequent d. None of these
Ans. c
11. What are maximal frequent itemsets? a. A frequent item set whose no super-itemset is frequent b. A frequent itemset whose super-itemset is also frequent c. Both a and b d. None of these
Ans. a
12. What is association rule mining?
a. Same as frequent itemset mining b. Finding of strong association rules using frequent itemsets c. Both a and b d. None of these
Ans. b
13. What is frequent pattern growth?
a. Same as frequent itemset mining b. Use of hashing to make discovery of frequent itemsets more efficient c. Mining of frequent itemsets without candidate generation d. None of these
Ans. c
14. When is sub-itemset pruning done?
a. A frequent itemset ‘P’ is a proper subset of another frequent itemset ‘Q’ b. Support (P) = Support(Q) c. When both a and b is true d. When a is true and b is not
Ans. c
15. Our use of association analysis will yield the same frequent itemsets and strong association rules whether a specific item occurs once or three times in an individual transaction
a. TRUE b. FALSE c. Both a and b d. None of these
Ans. a
16. The number of iterations in apriori __
a. increases with the size of the data b. decreases with the increase in size of the data c. increases with the size of the maximum frequent set d. decreases with increase in size of the maximum frequent set
Ans. c
17. Frequent item sets is a. Superset of only closed frequent item sets b. Superset of only maximal frequent item sets c. Subset of maximal frequent item sets d. Superset of both closed frequent item sets and maximal frequent item sets
Ans. c
18. Significant Bottleneck in the Apriori algorithm is a. Finding frequent itemsets b. pruning c. Candidate generation d. Number of iterations
Ans. c
19. Which Association Rule would you prefer a. High support and medium confidence b. High support and low confidence c. Low support and high confidence d. Low support and low confidence
Ans. c
20. The apriori property means a. If a set cannot pass a test, its supersets will also fail the same test b. To decrease the efficiency, do level-wise generation of frequent item sets c. To improve the efficiency, do level-wise generation of frequent item sets d. If a set can pass a test, its supersets will fail the same test
Ans. c
21. To determine association rules from frequent item sets a. Only minimum confidence needed b. Neither support not confidence needed c. Both minimum support and confidence are needed d. Minimum support is needed
Ans. c
22. A collection of one or more items is called as _____
( a ) Itemset ( b ) Support ( c ) Confidence ( d ) Support Count Ans. a
23. Frequency of occurrence of an itemset is called as _____
(a) Support (b) Confidence (c) Support Count (d) Rules Ans. c
24. An itemset whose support is greater than or equal to a minimum support threshold is ______
(a) Itemset (b) Frequent Itemset (c) Infrequent items (d) Threshold values
Ans. b
25. The goal of clustering is to- a. Divide the data points into groups b. Classify the data point into different classes c. Predict the output values of input data points d. All of the above
Ans. a
26. Clustering is a- a. Supervised learning b. Unsupervised learning c. Reinforcement learning d. None Ans. b 27. Which of the following clustering algorithms suffers from the problem of convergence at local optima? a. K- Means clustering b. Hierarchical clustering c. Diverse clustering d. All of the above Ans. d
28. Which version of the clustering algorithm is most sensitive to outliers? a. K-means clustering algorithm b. K-modes clustering algorithm c. K-medians clustering algorithm d. None
Ans. a 29. Which of the following is a bad characteristic of a dataset for clustering analysis-
a. Data points with outliers b. Data points with different densities c. Data points with non-convex shapes d. All of the above Ans. d
30. For clustering, we do not require- a. Labeled data b. Unlabeled data c. Numerical data d. Categorical data
Ans. a 31. The final output of Hierarchical clustering is- a. The number of cluster centroids b. The tree representing how close the data points are to each other c. A map defining the similar data points into individual groups d. All of the above Ans. b
32. Which of the step is not required for K-means clustering?
a. a distance metric b. initial number of clusters c. initial guess as to cluster centroids d. None Ans. d
33. Which of the following uses merging approach? a. Hierarchical clustering b. Partitional clustering c. Density-based clustering d. All of the above Ans. a 34. When does k-means clustering stop creating or optimizing clusters? a. After finding no new reassignment of data points b. After the algorithm reaches the defined number of iterations c. Both A and B d. None Ans. c 35. Which of the following clustering algorithm follows a top to bottom approach? a. K-means b. Divisible c. Agglomerative d. None Ans. b 36. Which algorithm does not require a dendrogram? a. K-means b. Divisible c. Agglomerative d. None
Ans. a 37. What is a dendrogram?
a. A hierarchical structure b. A diagram structure c. A graph structure d. None
Ans. a
38. Which one of the following can be considered as the final output of the hierarchal type of clustering? a. A tree which displays how the close thing are to each other b. Assignment of each point to clusters c. Finalize estimation of cluster centroids d. None of the above
Ans. a
39. Which one of the following statements about the K-means clustering is incorrect?
a. The goal of the k-means clustering is to partition (n) observation into (k) clusters b. K-means clustering can be defined as the method of quantization c. The nearest neighbor is the same as the K-means d. All of the above
Ans. c
40. The self-organizing maps can also be considered as the instance of _________ type of learning.
a. Supervised learning b. Unsupervised learning c. Missing data imputation d. Both A & C
Ans. b
41. Euclidean distance measure is can also defined as ___________
a. The process of finding a solution for a problem simply by enumerating all possible solutions according to some predefined order and then testing them
b. The distance between two points as calculated using the Pythagoras theorem c. A stage of the KDD process in which new data is added to the existing selection. d. All of the above
Ans. c
42. Which of the following refers to the sequence of pattern that occurs frequently?
a. Frequent sub-sequence b. Frequent sub-structure c. Frequent sub-items d. All of the above
Ans. a 43. Which method of analysis does not classify variables as dependent or independent? a) Regression analysis b) Discriminant analysis c) Analysis of variance d) Cluster analysis Answer: (d)
1. The Process of describing the data that is huge and complex to store and process is known as
a. Analytics b. Data mining c. Big Data d. Data Warehouse
Ans C
2. Data generated from online transactions is one of the example for volume of big data. Is this true or False. a. TRUE b. FALSE
Ans. a 3. Velocity is the speed at which the data is processed
a. TRUE b. FALSE
Ans. b
4. _____________ have a structure but cannot be stored in a database.
a. Structured b. Semi-Structured c. Unstructured d. None of these
Ans. b 5. ____________refers to the ability to turn your data useful for business.
a. Velocity b. Variety c. Value d. Volume
Ans. C
6. Value tells the trustworthiness of data in terms of quality and accuracy.
a. TRUE b. FALSE
Ans. b 7. GFS consists of a ____________ Master and ___________ Chunk Servers a. Single, Single b. Multiple, Single c. Single, Multiple
d. Multiple, Multiple
Ans. c
8. Files are divided into ____________ sized Chunks. a. Static b. Dynamic c. Fixed d. Variable Ans. c
9. ____________is an open source framework for storing data and running application on clusters of commodity hardware. a. HDFS b. Hadoop c. MapReduce d. Cloud Ans. B
10. HDFS Stores how much data in each clusters that can be scaled at any time? a. 32 b. 64 c. 128 d. 256 Ans. c
11. Hadoop MapReduce allows you to perform distributed parallel processing on large volumes of data quickly and efficiently... is this MapReduce or Hadoop... i.e statement is True or False a. TRUE b. FALSE Ans. a
12. Hortonworks was introduced by Cloudera and owned by Yahoo. a. TRUE b. FALSE Ans. b
13. Hadoop YARN is used for Cluster Resource Management in Hadoop Ecosystem. a. TRUE b. FALSE Ans. a
14. Google Introduced MapReduce Programming model in 2004. a. TRUE b. FALSE Ans. A
15.______________ phase sorts the data & ____________creates logical clusters. a. Reduce, YARN b. MAP, YARN c. REDUCE, MAP d. MAP, REDUCE Ans. d
16. There is only one operation between Mapping and Reducing is it True or False...
a. TRUE b. FALSE
Ans. A
17. __________ is factors considered before Adopting Big Data Technology. a. Validation b. Verification c. Data d. Design Ans. a
18. _________ for improving supply chain management to optimize stock management, replenishment, and forecasting; a. Descriptive b. Diagnostic c. Predictive d. Prescriptive Ans. c
19. which among the following is not a Data mining and analytical applications? a. profile matching b. social network analysis c. facial recognition d. Filtering Ans. d
20. ________________ as a result of data accessibility, data latency, data availability, or limits on bandwidth in relation to the size of inputs. a. Computation-restricted throttling b. Large data volumes c. Data throttling d. Benefits from data parallelization Ans. c
21. As an example, an expectation of using a recommendation engine would be to increase same-customer sales by adding more items into the market basket. a. Lowering costs b. Increasing revenues c. Increasing productivity d. Reducing risk Ans. b
22. Which storage subsystem can support massive data volumes of increasing size. a. Extensibility b. Fault tolerance c. Scalability d. High-speed I/O capacity Ans. c
23. ______________provides performance through distribution of data and fault tolerance through replication a. HDFS b. PIG c. HIVE d. HADOOP
Ans. a
24. ______________ is a programming model for writing applications that can process Big Data in parallel on multiple nodes. a. HDFS b. MAP REDUCE c. HADOOP d. HIVE Ans. b
25. _____________________ takes the grouped key-value paired data as input and runs a Reducer function on each one of them. a. MAPPER b. REDUCER c. COMBINER d. PARTITIONER Ans. b
26. _______________ is a type of local Reducer that groups similar data from the map phase into identifiable sets. a. MAPPER b. REDUCER c. COMBINER d. PARTITIONER. Ans. c
27. MongoDB is __________________ a. Column Based b. Key Value Based c. Document Based d. Graph Based Ans. c
28. ____________ is the process of storing data records across multiple machines a. Sharding b. HDFS c. HIVE d. HBASE Ans. a
29. The results of a hive query can be stored as a. Local File b. HDFS File c. Both d. Cannot be stored Ans. c 30. The position of a specific column in a Hive table a. can be anywhere in the table creation clause b. must match the position of the corresponding data in the data file c. Must match the position only for date time data type in the data file d. Must be arranged alphabetically Ans. b 31. The Hbase tables are A. Made read only by setting the read-only option B. Always writeable
C. Always read-only D. Are made read only using the query to the table
Ans. a 32. Hbase creates a new version of a record during A. Creation of a record B. Modification of a record C. Deletion of a record D. All the above Ans. d 33. Which among the following are incorrect in regards with NoSQL? a. Its Easy and ready to manage with clusters. b. Suitable for upcoming data explosions. c. It requires to keep track with data structure d. Provide easy and flexible system. Ans. c 34. Which Database Administrator job was in trends with job trends? a. MongoDB b. CouchDB c. SimpleDB d. Redis Ans. a 35. No SQL Means _________________ a. Not SQL b. No Usage of SQl c. Not Only SQL d. Not for SQL Ans. c 36. A list of 5 pulse rates is: 70, 64, 80, 74, 92. What is the median for this list? a. 74 b. 76 c. 77 d. 80 Ans. a 37. Which of the following would indicate that a dataset is not bell-shaped? a. The range is equal to 5 standard deviations. b. The range is larger than the interquartile range. c. The mean is much smaller than the median. d. There are no outliers Ans. c 38. What is the effect of an outlier on the value of a correlation coefficient? a. An outlier will always decrease a correlation coefficient. b. An outlier will always increase a correlation coefficient. c. An outlier might either decrease or increase a correlation coefficient, depending on where it is in relation to the other points. d. An outlier will have no effect on a correlation coefficient. Ans. c 39. One use of a regression line is a. to determine if any x-values are outliers. b. to determine if any y-values are outliers. c. to determine if a change in x causes a change in y. d. to estimate the change in y for a one-unit change in x. Ans. d 40. Which package contains most of the basic function in R. a. Root b. Basic c. Parent
d. R
Ans. b
SET II
1. who was the developer of Hadoop language?
A. Apache Software Foundation B. Hadoop Software Foundation C. Sun Microsystems D. Bell Labs View Answer Ans : A
Explanation: Hadoop Developed by: Apache Software Foundation.
2. The hadoop language wriiten in which language?
A. C B. C++ C. Java D. Python View Answer Ans : C
Explanation: The hadoop language Written in: Java. 3. What was the Initial release date of hadoop?
A. 1st April 2007 B. 1st April 2006 C. 1st April 2008 D. 1st April 2005 View Answer Ans : B
Explanation: Initial release: April 1, 2006; 13 years ago. 4. What license is Hadoop distributed under?
A. Apache License 2.1 B. Apache License 2.2 C. Apache License 2.0 D. Apache License 1.0 View Answer Ans : C
Explanation: Hadoop is Open Source, released under Apache 2 license.
5. IBM and ________ have announced a major initiative to use Hadoop to support university courses in distributed computer programming.
A. Google B. Apple C. Facebook D. Microsoft View Answer Ans : A
Explanation: Google and IBM Announce University Initiative to Address Internet-Scale. 6. On which platfrm hadoop langauge runs?
A. Bare metal B. Debian C. Cross-platform D. Unix-Like View Answer Ans : C
Explanation: Hadoop has support for cross platform operating system.
10. Which of the following is not Features Of Hadoop?
A. Suitable for Big Data Analysis B. Scalability C. Robust D. Fault Tolerance View Answer Ans : C
Explanation: Robust is is not Features Of Hadoop.
1. The MapReduce algorithm contains two important tasks, namely __________.
A. mapped, reduce B. mapping, Reduction C. Map, Reduction D. Map, Reduce View Answer Ans : D
Explanation: The MapReduce algorithm contains two important tasks, namely Map and Reduce. 2. _____ takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs).
A. Map B. Reduce C. Both A and B D. Node View Answer Ans : A
Explanation: Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). 3. ______ task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples.
A. Map B. Reduce C. Node D. Both A and B View Answer Ans : B
Explanation: Reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples. 4. In how many stages the MapReduce program executes?
A. 2 B. 3 C. 4 D. 5 View Answer
Ans : B
Explanation: MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage. 5. Which of the following is used to schedules jobs and tracks the assign jobs to Task tracker?
A. SlaveNode B. MasterNode C. JobTracker D. Task Tracker View Answer Ans : C
Explanation: JobTracker : Schedules jobs and tracks the assign jobs to Task tracker. 6. Which of the following is used for an execution of a Mapper or a Reducer on a slice of data?
A. Task B. Job C. Mapper D. PayLoad View Answer Ans : A
Explanation: Task : An execution of a Mapper or a Reducer on a slice of data. 7. Which of the following commnd runs a DFS admin client?
A. secondaryadminnode B. nameadmin C. dfsadmin D. adminsck View Answer Ans : C
Explanation: dfsadmin : Runs a DFS admin client. 8. Point out the correct statement.
A. MapReduce tries to place the data and the compute as close as possible B. Map Task in MapReduce is performed using the Mapper() function C. Reduce Task in MapReduce is performed using the Map() function D. None of the above View Answer Ans : A
Explanation: This feature of MapReduce is "Data Locality". 9. Although the Hadoop framework is implemented in Java, MapReduce applications need not be written in ____________
A. C B. C# C. Java D. None of the above View Answer
Ans : C
Explanation: Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications (non JNITM based). 10. The number of maps is usually driven by the total size of ____________
A. Inputs B. Output C. Task D. None of the above View Answer Ans : A
Explanation: Total size of inputs means the total number of blocks of the input files. 1. What is full form of HDFS?
A. Hadoop File System B. Hadoop Field System C. Hadoop File Search D. Hadoop Field search View Answer Ans : A
Explanation: Hadoop File System was developed using distributed file system design. 2. HDFS works in a __________ fashion.
A. worker-master fashion B. master-slave fashion C. master-worker fashion D. slave-master fashion View Answer Ans : B
Explanation: HDFS follows the master-slave architecture. 3. Which of the following are the Goals of HDFS?
A. Fault detection and recovery B. Huge datasets C. Hardware at data D. All of the above View Answer Ans : D
Explanation: All the above option are the goals of HDFS. 4. ________ NameNode is used when the Primary NameNode goes down.
A. Rack B. Data C. Secondary D. Both A and B View Answer Ans : C
Explanation: Secondary namenode is used for all time availability and reliability.
5. The minimum amount of data that HDFS can read or write is called a _____________.
A. Datanode B. Namenode C. Block D. None of the above View Answer Ans : C
Explanation: The minimum amount of data that HDFS can read or write is called a Block. 6. The default block size is ______.
A. 32MB B. 64MB C. 128MB D. 16MB View Answer Ans : B
Explanation: The default block size is 64MB, but it can be increased as per the need to change in HDFS configuration. 7. For every node (Commodity hardware/System) in a cluster, there will be a _________.
A. Datanode B. Namenode C. Block D. None of the above View Answer Ans : A
Explanation: For every node (Commodity hardware/System) in a cluster, there will be a datanode. 8. Which of the following is not Features Of HDFS?
A. It is suitable for the distributed storage and processing. B. Streaming access to file system data. C. HDFS provides file permissions and authentication. D. Hadoop does not provides a command interface to interact with HDFS. View Answer Ans : D
Explanation: The correct feature is Hadoop provides a command interface to interact with HDFS. 9. HDFS is implemented in _____________ language.
A. Perl B. Python C. Java D. C View Answer Ans : C
Explanation: HDFS is implemented in Java and any computer which can run Java can host a NameNode/DataNode on it.
10. During start up, the ___________ loads the file system state from the fsimage and the edits log file.
A. Datanode B. Namenode C. Block D. ActionNode View Answer Ans : B
Explanation: HDFS is implemented on any computer which can run Java can host a NameNode/DataNode on it. 1. Which of the following is not true about Pig?
A. Apache Pig is an abstraction over MapReduce B. Pig can not perform all the data manipulation operations in Hadoop. C. Pig is a tool/platform which is used to analyze larger sets of data representing them as data flows. D. None of the above View Answer Ans : B
Explanation: Pig is generally used with Hadoop; we can perform all the data manipulation operations in Hadoop using Apache Pig. 2. Which of the following is/are a feature of Pig?
A. Rich set of operators B. Ease of programming C. Extensibility D. All of the above View Answer Ans : D
Explanation: All options are the following Features of Pig. 3. In which year apache Pig was released?
A. 2005 B. 2006 C. 2007 D. 2008 View Answer Ans : B
Explanation: In 2006, Apache Pig was developed as a research project. 4. Pig operates in mainly how many nodes?
A. 2 B. 3 C. 4 D. 5 View Answer Ans : A
Explanation: You can run Pig (execute Pig Latin statements and Pig commands) using various mode: Interactive and Batch Mode. 5. Which of the following company has developed PIG?
A. Google B. Yahoo C. Microsoft D. Apple View Answer Ans : B
Explanation: Apache Pig was developed as a research project at Yahoo, especially to create and execute MapReduce jobs on every dataset. 6. Which of the following function is used to read data in PIG?
A. Write B. Read C. Perform D. Load View Answer Ans : D
Explanation: PigStorage is the default load function. 7. __________ is a framework for collecting and storing script-level statistics for Pig Latin.
A. Pig Stats B. PStatistics C. Pig Statistics D. All of the above View Answer Ans : C
Explanation: The new Pig statistics and the existing Hadoop statistics can also be accessed via the Hadoop job history file. 8. Which of the following is true statement?
A. Pig is a high level language. B. Performing a Join operation in Apache Pig is pretty simple. C. Apache Pig is a data flow language. D. All of the above View Answer Ans : D
Explanation: All option are true statement. 9. Which of the following will compile the Pigunit?
A. $pig_trunk ant pigunit-jar B. $pig_tr ant pigunit-jar C. $pig_ ant pigunit-jar D. $pigtr_ ant pigunit-jar View Answer Ans : A
Explanation: The compile will create the pigunit.jar file.
10. Point out the wrong statement.
A. Pig can invoke code in language like Java Only B. Pig enables data workers to write complex data transformations without knowing Java C. Pig's simple SQL-like scripting language is called Pig Latin, and appeals to developers already familiar with scripting languages and SQL D. Pig is complete, so you can do all required data manipulations in Apache Hadoop with Pig View Answer Ans : A
Explanation: Through the User Defined Functions(UDF) facility in Pig, Pig can invoke code in many languages like JRuby, Jython and Java. 1. Which of the following is/are INCORRECT with respect to Hive?
A. Hive provides SQL interface to process large amount of data B. Hive needs a relational database like oracle to perform query operations and store data. C. Hive works well on all files stored in HDFS D. Both A and B View Answer Ans : B
Explanation: Hive needs a relational database like oracle to perform query operations and store data is incorrect with respect to Hive. 2. Which of the following is not a Features of HiveQL?
A. Supports joins B. Supports indexes C. Support views D. Support Transactions View Answer Ans : D
Explanation: Support Transactions is not a Features of HiveQL. 3. Which of the following operator executes a shell command from the Hive shell?
A. | B. ! C. # D. $ View Answer Ans : B
Explanation: Exclamation operator is for execution of command. 4. Hive uses _________ for logging.
A. logj4 B. log4l C. log4i D. log4j View Answer Ans : D
Explanation: By default Hive will use hive-log4j.default in the conf/ directory of the Hive installation. 5. HCatalog is installed with Hive, starting with Hive release is ___________
A. 0.10.0 B. 0.9.0 C. 0.11.0 D. 0.12.0 View Answer Ans : C
Explanation: hcat commands can be issued as hive commands, and vice versa. 6. _______ supports a new command shell Beeline that works with HiveServer2.
A. HiveServer2 B. HiveServer3 C. HiveServer4 D. HiveServer5 View Answer Ans : A
Explanation: The Beeline shell works in both embedded mode as well as remote mode. 7. The ________ allows users to read or write Avro data as Hive tables.
A. AvroSerde B. HiveSerde C. SqlSerde D. HiveQLSerde View Answer Ans : A
Explanation: AvroSerde understands compressed Avro files. 8. Which of the following data type is supported by Hive?
A. map B. record C. string D. enum View Answer Ans : D
Explanation: Hive has no concept of enums. 9. We need to store skill set of MCQs(which might have multiple values) in MCQs table, which of the following is the best way to store this information in case of Hive?
A. Create a column in MCQs table of STRUCT data type B. Create a column in MCQs table of MAP data type C. Create a column in MCQs table of ARRAY data type D. As storing multiple values in a column of MCQs itself is a violation View Answer Ans : C
Explanation: Option C is correct.
10. Letsfindcourse is generating huge amount of data. They are generating huge amount of sensor data from different courses which was unstructured in form. They moved to Hadoop framework for storing and analyzing data. What technology in Hadoop framework, they can use to analyse this unstructured data?
A. MapReduce programming B. Hive C. RDBMS D. None of the above View Answer Ans : A
Explanation: MapReduce programming is the right answer. 1. which of the following is correct statement?
A. HBase is a distributed column-oriented database B. Hbase is not open source C. Hbase is horizontally scalable. D. Both A and C View Answer Ans : D
Explanation: HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an open-source project and is horizontally scalable. 2. which of the following is not a feature of Hbase?
A. HBase is lateral scalable. B. It has automatic failure support. C. It provides consistent read and writes. D. It has easy java API for client. View Answer Ans : A
Explanation: Option A is incorrect because HBase is linearly scalable. 3. When did HBase was first released?
A. April 2007 B. March 2007 C. February 2007 D. May 2007 View Answer Ans : C
Explanation: HBase was first released in February 2007. Later in January 2008, HBase became a sub project of Apache Hadoop. 4. Apache HBase is a non-relational database modeled after Google's _________
A. BigTop B. Bigtable C. Scanner D. FoundationDB View Answer Ans : B
Explanation: Bigtable acts up on Google File System, likewise Apache HBase works on top of Hadoop and HDFS. 5. HBaseAdmin and ____________ are the two important classes in this package that provide DDL functionalities.
A. HTableDescriptor B. HDescriptor C. HTable D. HTabDescriptor View Answer Ans : A
Explanation: Java provides an Admin API to achieve DDL functionalities through programming 6. which of the following is correct statement?
A. HBase provides fast lookups for larger tables. B. It provides low latency access to single rows from billions of records C. HBase is a database built on top of the HDFS. D. All of the above View Answer Ans : D
Explanation: All the options are correct. 7. HBase supports a ____________ interface via Put and Result.
A. bytes-in/bytes-out B. bytes-in C. bytes-out D. None of the above View Answer Ans : A
Explanation: Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes. 8. Which command is used to disable all the tables matching the given regex?
A. remove all B. drop all C. disable_all D. None of the above View Answer Ans : C
Explanation: The syntax for disable_all command is as follows : hbase > disable_all 'r.*' 9. _________ is the main configuration file of HBase.
A. hbase.xml B. hbase-site.xml C. hbase-site-conf.xml D. hbase-conf.xml View Answer Ans : B
Explanation: Set the data directory to an appropriate location by opening the HBase home folder in /usr/local/HBase. 10. which of the following is incorrect statement?
A. HBase is built for wide tables B. Transactions are there in HBase. C. HBase has de-normalized data. D. HBase is good for semi-structured as well as structured data. View Answer Ans : B
Explanation: No transactions are there in HBase. 1. R was created by?
A. Ross Ihaka B. Robert Gentleman C. Both A and B D. Ross Gentleman View Answer Ans : C
Explanation: R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. 2. R allows integration with the procedures written in the?
A. C B. Ruby C. Java D. Basic View Answer Ans : A
Explanation: R allows integration with the procedures written in the C, C++, .Net, Python or FORTRAN languages for efficiency. 3. R is free software distributed under a GNU-style copy left, and an official part of the GNU project called?
A. GNU A B. GNU S C. GNU L D. GNU R View Answer Ans : B
Explanation: R is free software distributed under a GNU-style copy left, and an official part of the GNU project called GNU S. 4. R made its first appearance in?
A. 1992 B. 1995 C. 1993 D. 1994 View Answer
Ans : C
Explanation: R made its first appearance in 1993. 5. Which of the following is true about R?
A. R is a well-developed, simple and effective programming language B. R has an effective data handling and storage facility C. R provides a large, coherent and integrated collection of tools for data analysis. D. All of the above View Answer Ans : D
Explanation: All of the above statement are true. 6. Point out the wrong statement?
A. Setting up a workstation to take full advantage of the customizable features of R is a straightforward thing B. q() is used to quit the R program C. R has an inbuilt help facility similar to the man facility of UNIX D. Windows versions of R have other optional help systems also View Answer Ans : B
Explanation: help command is used for knowing details of particular command in R. 7. Command lines entered at the console are limited to about ________ bytes
A. 4095 B. 4096 C. 4097 D. 4098 View Answer Ans : A
Explanation: Elementary commands can be grouped together into one compound expression by braces (‘{’ and ‘}’). 8. R language is a dialect of which of the following languages?
A. s B. c C. sas D. matlab View Answer Ans : A
Explanation: The R language is a dialect of S which was designed in the 1980s. Since the early 90’s the life of the S language has gone down a rather winding path. The scoping rules for R are the main feature that makes it different from the original S language. 9. How many atomic vector types does R have?
A. 3 B. 4 C. 5 D. 6 View Answer
Ans : D
Explanation: R language has 6 atomic data types. They are logical, integer, real, complex, string (or character) and raw. There is also a class for “raw” objects, but they are not commonly used directly in data analysis. 10. R files has an extension _____.
A. .S B. .RP C. .R D. .SP View Answer Ans : C
Explanation: All R files have an extension .R. R provides a mechanism for recalling and reexecuting previous commands. All S programmed files will have an extension .S. But R has many functions than S. 1. What will be output for the following code?
v <- TRUE
print(class(v))
A. logical B. Numeric C. Integer D. Complex View Answer Ans : A
Explanation: It produces the following result : [1] ""logical""
2. What will be output for the following code?
v <- ""TRUE""
print(class(v))
A. logical B. Numeric C. Integer D. Character View Answer Ans : D
Explanation: It produces the following result : [1] ""character""
3. In R programming, the very basic data types are the R-objects called?
A. Lists B. Matrices
C. Vectors D. Arrays View Answer Ans : C
Explanation: In R programming, the very basic data types are the R-objects called vectors
4. Data Frames are created using the?
A. frame() function B. data.frame() function C. data() function D. frame.data() function View Answer Ans : B
Explanation: Data Frames are created using the data.frame() function 5. Which functions gives the count of levels?
A. level B. levels C. nlevels D. nlevel View Answer Ans : C
Explanation: Factors are created using the factor() function. The nlevels functions gives the count of levels. 6. Point out the correct statement?
A. Empty vectors can be created with the vector() function B. A sequence is represented as a vector but can contain objects of different classes C. "raw” objects are commonly used directly in data analysis D. The value NaN represents undefined value View Answer Ans : A
Explanation: A vector can only contain objects of the same class. 7. What will be the output of the following R code?
> x <- vector(""numeric"", length = 10)
> x
A. 1 0 B. 0 0 0 0 0 0 0 0 0 0 C. 0 1 D. 0 0 1 1 0 1 1 0 View Answer Ans : B
Explanation: You can also use the vector() function to initialize vectors.
8. What will be output for the following code?
> sqrt(-17)
A. -4.02 B. 4.02 C. 3.67 D. NAN View Answer Ans : D
Explanation: These metadata can be very useful in that they help to describe the object. 9. _______ function returns a vector of the same size as x with the elements arranged in increasing order.
A. sort() B. orderasc() C. orderby() D. sequence() View Answer Ans : A
Explanation: There are other more flexible sorting facilities available like order() or sort.list() which produce a permutation to do the sorting. 10. What will be the output of the following R code?
> m <- matrix(nrow = 2, ncol = 3)
> dim(m)
A. 3 3 B. 3 2 C. 2 3 D. 2 2 View Answer Ans : C
Explanation: Matrices are constructed column-wise. 1. Which loop executes a sequence of statements multiple times and abbreviates the code that manages the loop variable?
A. for B. while C. do-while D. repeat View Answer Ans : D
Explanation: repeat loop : Executes a sequence of statements multiple times and abbreviates the code that manages the loop variable. 2. Which of the following true about for loop?
A. Repeats a statement or group of statements while a given condition is true. It tests the condition before executing the loop body. B. it tests the condition at the end of the loop body. C. Both A and B D. None of the above View Answer Ans : B
Explanation: for loop : Like a while statement, except that it tests the condition at the end of the loop body. 3. Which statement simulates the behavior of R switch?
A. Next B. Previous C. break D. goto View Answer Ans : A
Explanation: The next statement simulates the behavior of R switch. 4. In which statement terminates the loop statement and transfers execution to the statement immediately following the loop?
A. goto B. switch C. break D. label View Answer Ans : C
Explanation: Break : Terminates the loop statement and transfers execution to the statement immediately following the loop. 5. Point out the wrong statement?
A. Multi-line expressions with curly braces are just not that easy to sort through when working on the command line B. lappy() loops over a list, iterating over each element in that list C. lapply() does not always returns a list D. You cannot use lapply() to evaluate a function multiple times each with a different argument View Answer Ans : C
Explanation: lapply() always returns a list, regardless of the class of the input. 6. The mapply() function is a multivariate apply of sorts which applies a function in parallel over a set of arguments.
A. TRUE B. FALSE C. Can be true or false D. Can not say View Answer Ans : A
Explanation: True, The mapply() function is a multivariate apply of sorts which applies a function in parallel over a set of arguments. 7. Which of the following is valid body of split function?
A. function (x, f) B. function (x, f, drop = FALSE, …) C. function (x, drop = FALSE, …) D. function (drop = FALSE, …) View Answer Ans : B
Explanation: x is a vector (or list) or data frame 8. Which of the following character skip during execution?
v <- LETTERS[1:6]
for ( i in v) {
if (i == ""D"") {
next
}
print(i)
}
A. A B. B C. C D. D View Answer Ans : D
Explanation: When the above code is compiled and executed, it produces the following result : [1] ""A"" [1] ""B"" [1] ""C"" [1] ""E"" [1] ""F""
9. What will be output for the following code?
v <- LETTERS[1]
for ( i in v) {
print(v)
}
A. A B. A B C. A B C D. A B C D View Answer Ans : A
Explanation: The output for the following code : [1] ""A"" 10. What will be output for the following code?
v <- LETTERS[""A""]
for ( i in v) {
print(v)
}
A. A B. NAN C. NA D. Error View Answer Ans : C
Explanation: The output for the following code : [1] NA 1. An R function is created by using the keyword?
A. fun B. function C. declare D. extends View Answer Ans : B
Explanation: An R function is created by using the keyword function. 2. What will be output for the following code?
print(mean(25:82))
A. 1526 B. 53.5 C. 50.5 D. 55 View Answer Ans : B
Explanation: The code will find mean of numbers from 25 to 82 that is 53.5 3. Point out the wrong statement?
A. Functions in R are “second class objects” B. The writing of a function allows a developer to create an interface to the code, that is explicitly specified with a set of parameters
C. Functions provides an abstraction of the code to potential users D. Writing functions is a core activity of an R programmer View Answer Ans : A
Explanation: Functions in R are “first class objects”, which means that they can be treated much like any other R object. 4. What will be output for the following code?
> paste("a", "b", se = ":")
A. a+b B. a:b C. a-b D. None of the above View Answer Ans : D
Explanation: With the paste() function, the arguments sep and collapse must be named explicitly and in full if the default values are not going to be used. 5. Which function in R language is used to find out whether the means of 2 groups are equal to each other or not?
A. f.tests () B. l.tests () C. t.tests () D. p.tests () View Answer Ans : C
Explanation: t.tests () function in R language is used to find out whether the means of 2 groups are equal to each other. It is not used most commonly in R. It is used in some specific conditions. 6. What will be the output of log (-5.8) when executed on R console?
A. NA B. NAN C. 0.213 D. Error View Answer Ans : B
Explanation: Executing the above on R console or terminal will display a warning sign that NaN (Not a Number) will be produced in R console because it is not possible to take a log of a negative number(-). 7. Which function is preferred over sapply as vapply allows the programmer to specific the output type?
A. Lapply B. Japply C. Vapply D. Zapply View Answer
Ans : C
Explanation: Vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use. simplify2array() is the utility called from sapply() when simplify is not false and is similarly called from mapply(). 8. How will you check if an element is present in a vector?
A. Match() B. Dismatch() C. Mismatch() D. Search() View Answer Ans : A
Explanation: It can be done using the match () function- match () function returns the first appearance of a particular element. The other way is to use %in% which returns a Boolean value either true or false. 9. You can check to see whether an R object is NULL with the _________ function.
A. is.null() B. is.nullobj() C. null() D. as.nullobj() View Answer Ans : A
Explanation: It is sometimes useful to allow an argument to take the NULL value, which might indicate that the function should take some specific action. 10. In the base graphics system, which function is used to add elements to a plot?
A. Boxplot() B. Text() C. Treat() D. Both A and B View Answer Ans : D
Explanation: In the base graphics system, boxplot or text function is used to add elements to a plot. 1. Which of the following syntax is used to install forecast package?
A. install.pack("forecast") B. install.packages("cast") C. install.packages("forecast") D. install.pack("forecastcast") View Answer Ans : C
Explanation: forecast is used for time series analysis 2. Which splits a data frame and returns a data frame?
A. apply B. ddply
C. stats D. plyr View Answer Ans : B
Explanation: ddply splits a data frame and returns a data frame. 3. Which of the following is an R package for the exploratory analysis of genetic and genomic data?
A. adeg B. adegenet C. anc D. abd View Answer Ans : B
Explanation: This package contains Classes and functions for genetic data analysis within the multivariate framework. 4. Which of the following contains functions for processing uniaxial minute-to-minute accelerometer data?
A. accelerometry B. abc C. abd D. anc View Answer Ans : A
Explanation: This package contains a collection of functions that perform operations on timeseries accelerometer data, such as identify non-wear time, flag minutes that are part of an activity about, and find the maximum 10-minute average count value. 5. ______ Uses Grieg-Smith method on 2 dimensional spatial data.
A. G.A. B. G2db C. G.S. D. G1DBN View Answer Ans : C
Explanation: The function returns a GriegSmith object which is a matrix with block sizes, sum of squares for each block size as well as mean sums of squares. G1DBN is a package performing Dynamic Bayesian Network Inference. 6. Which of the following package provide namespace management functions not yet present in base R?
A. stringr B. nbpMatching C. messagewarning D. namespace View Answer Ans : D
Explanation: The package namespace is one of the most confusing parts of building a package. nbpMatching contains functions for non-bipartite optimal matching. 7. What will be the output of the following R code?
install.packages(c("devtools", "roxygen2"))
A. Develops the tools B. Installs the given packages C. Exits R studio D. Nothing happens View Answer Ans : B
Explanation: Make sure you have the latest version of R and then run the above code to get the packages you’ll need. It installs the given packages. Confirm that you have a recent version of RStudio. 8. A bundled package is a package that’s been compressed into a ______ file.
A. Double B. Single C. Triple D. No File View Answer Ans : B
Explanation: A bundled package is a package that’s been compressed into a single file. A source package is just a directory with components like R/, DESCRIPTION, and so on. 9. .library() is not useful when developing a package since you have to install the package first.
A. TRUE B. FALSE C. Can be true or false D. Can not say View Answer Ans : A
Explanation: library() is not useful when developing a package since you have to install the package first. A library is a simple directory containing installed packages.
10. DESCRIPTION uses a very simple file format called DCF.
A. TRUE B. FALSE C. Can be true or false D. Can not say View Answer Ans : A
Explanation: DESCRIPTION uses a very simple file format called DCF, the Debian control format. When you first start writing packages, you’ll mostly use these metadata to record what packages are needed to run your package.
19. HDFS Stores how much data in each clusters that can be scaled at any time? 1. 32 2. 64 3. 128 4. 256 Show Answer 128
33. _____ provides performance through distribution of data and fault tolerance through replication 1. HDFS 2. PIG 3. HIVE 4. HADOOP Show Answer HDFS
34. ______ is a programming model for writing applications that can process Big Data in parallel on multiple nodes. 1. HDFS 2. MAP REDUCE 3. HADOOP 4. HIVE Show Answer MAP REDUCE
35. ____ takes the grouped key-value paired data as input and runs a Reducer function on each one of them. 1. MAPPER 2. REDUCER 3. COMBINER 4. PARTITIONER Show Answer REDUCER
36. ____ is a type of local Reducer that groups similar data from the map phase into identifiable sets. 1. MAPPER 2. REDUCER 3. COMBINER 4. PARTITIONER Show Answer COMBINER
37. While Installing Hadoop how many xml files are edited and list them ? 1. core-site.xml 2. hdfs-site.xml 3. mapred.xml 4. yarn.xml Show Answer core-site.xml
This set of Object Oriented Programming using C++ Assessment Questions and Answers focuses on “Pointer to Objects”. 1. Which language among the following doesn’t allow pointers? a) C++ b) Java c) Pascal d) C Answer: b Explanation: The concept of pointers is not supported in Java. The feature is not given in the language but can be used in some ways explicitly. Though this pointer is supported by java too. 2. Which is correct syntax for declaring pointer to object? a) className* objectName; b) className objectName; c) *className objectName; d) className objectName(); Answer: a Explanation: The syntax must contain * symbol after the className as the type of object. This declares an object pointer. This can store address of any object of the specified class. 3. Which operator should be used to access the members of the class using object pointer? a) Dot operator b) Colon to the member c) Scope resolution operator d) Arrow operator Answer: d Explanation: The members can be accessed from the object pointer by using arrow operator. The arrow operator can be used only with the pointer of class type. If simple object is declared, it must use dot operator to access the members. 4. How does compiler decide the intended object to be used, if more than one object are used? a) Using object name b) Using an integer pointer c) Using this pointer d) Using void pointer Answer: c Explanation: This pointer denotes the object, in which it is being used. If member function is called with respect to one object then this pointer refers to the same object members. It can be used when members with same name are involved. 5. If pointer to an object is declared __________ a) It can store any type of address b) It can store only void addresses c) It can only store address of integer type d) It can only store object address of class type specified Answer: d Explanation: The address of only the specified class type can get their address stored in the object pointer. The addresses doesn’t differ but they do differ for the amount and type of memory required for objects of different classes. Hence same class object pointer should be used. 6. What is the size of an object pointer? a) Equal to size of any usual pointer b) Equal to size of sum of all the members of object c) Equal to size of maximum sized member of object d) Equal to size of void Answer: a Explanation: The size of object pointer is same as that of any usual pointer. This is because only the address have to be stored. There are no values to be stored in the pointer. 7. A pointer _________________ a) Can point to only one object at a time b) Can point to more than one objects at a time c) Can point to only 2 objects at a time d) Can point to whole class objects at a time Answer: a Explanation: The object pointer can point to only one object at a time. The pointer will be able to store only one address at a
time. Hence only one object can be referred. 8. Pointer to a base class can be initialized with the address of derived class, because of _________ a) derived-to-base implicit conversion for pointers b) base-to-derived implicit conversion for pointers c) base-to-base implicit conversion for pointers d) derived-to-derived implicit conversion for pointers Answer: a Explanation: It is an implicit rule defined in most of the programming languages. It permits the programmer to declare a pointer to the derived class from a base class pointer. In this way the programmer doesn’t have to declare object for derived class each time it is required. 9. Can pointers to object access the private members of the class? a) Yes, always b) Yes, only if it is only pointer to object c) No, because objects can be referenced from another objects too d) No, never Answer: d Explanation: The pointers to an object can never access the private members of the class outside the class. The object can indirectly use those private members using member functions which are public in the class. 10. Is name of an array of objects is also a pointer to object? a) Yes, always b) Yes, in few cases c) No, because it represents more than one object d) No, never Answer: a Explanation: The array name represents a pointer to the object. The name alone can represent the starting address of the array. But that also represents an array which is in turn stored in a pointer. 11. Which among the following is true? a) The pointer to object can hold address only b) The pointer can hold value of any type c) The pointer can hold only void reference d) The pointer can’t hold any value Answer: a Explanation: The pointer to an object can hold only the addresses. Address of any other object of same class. This allows the programmer to link more than one objects if required. 12. Which is the correct syntax to call a member function using pointer? a) pointer->function() b) pointer.function() c) pointer::function() d) pointer:function() Answer: a Explanation: The pointer should be mentioned followed by the arrow operator. Arrow operator is applicable only with the pointers. Then the function name should be mentioned that is to be called. 13. If a pointer to an object is created and the object gets deleted without using the pointer then __________ a) It becomes void pointer b) It becomes dangling pointer c) It becomes null pointer d) It becomes zero pointer Answer: b Explanation: When the address pointed by the object pointer gets deleted, the pointer now points to an invalid address. Hence it becomes a dangling pointer. It can’t be null or void pointer since it doesn’t point to any specific location. 14. How can the address stored in the pointer be retrieved? a) Using * symbol b) Using $ symbol c) Using & symbol d) Using @ symbol Answer: c Explanation: The & symbol must be used. This should be done such that the object should be preceded by & symbol and then
the address should be stored in another variable. This is done to get the address where the object is stored. 15. What should be done to prevent changes that may be made to the values pointed by the pointer? a) Usual pointer can’t change the values pointed b) Pointer should be made virtual c) Pointer should be made anonymous d) Pointer should be made const Answer: d Explanation: The pointer should be declared as a const type. This prevents the pointer to change any value that is being pointed from it. This is a feature that is made to access the values using pointer but to make sure that pointer doesn’t change those values accidently. 16. References to object are same as pointers of object. a) True b) False Answer: b Explanation: The references are made to object when the object is created and initialized with another object without calling any constructor. But the object pointer must be declared explicitly using * symbol that will be capable of storing some address. Hence both are different.
This set of Basic Object Oriented Programming using C++ Questions and Answers focuses on “Copy Constructor”. 1. Copy constructor is a constructor which ________________ a) Creates an object by copying values from any other object of same class b) Creates an object by copying values from first object created for that class c) Creates an object by copying values from another object of another class d) Creates an object by initializing it with another previously created object of same class Answer: d Explanation: The object that has to be copied to new object must be previously created. The new object gets initialized with the same values as that of the object mentioned for being copied. The exact copy is made with values. 2. The copy constructor can be used to ____________ a) Initialize one object from another object of same type b) Initialize one object from another object of different type c) Initialize more than one object from another object of same type at a time d) Initialize all the objects of a class to another object of another class Answer: a Explanation: The copy constructor has the most basic function to initialize the members of an object with same values as that of some previously created object. The object must be of same class. 3. If two classes have exactly same data members and member function and only they differ by class name. Can copy constructor be used to initialize one class object with another class object? a) Yes, possible b) Yes, because the members are same c) No, not possible d) No, but possible if constructor is also same Answer: c Explanation: The restriction for copy constructor is that it must be used with the object of same class. Even if the classes are exactly same the constructor won’t be able to access all the members of another class. Hence we can’t use object of another class for initialization. 4. The copy constructors can be used to ________ a) Copy an object so that it can be passed to a class b) Copy an object so that it can be passed to a function c) Copy an object so that it can be passed to another primitive type variable d) Copy an object for type casting Answer: b Explanation: When an object is passed to a function, actually its copy is made in the function. To copy the values, copy constructor is used. Hence the object being passed and object being used in function are different. 5. Which returning an object, we can use ____________ a) Default constructor b) Zero argument constructor c) Parameterized constructor d) Copy constructor Answer: d Explanation: While returning an object we can use the copy constructor. When we assign the return value to another object of same class then this copy constructor will be used. And all the members will be assigned the same values as that of the object being returned. 6. If programmer doesn’t define any copy constructor then _____________ a) Compiler provides an implicit copy constructor b) Compiler gives an error c) The objects can’t be assigned with another objects d) The program gives run time error if copying is used Answer: a Explanation: The compiler provides an implicit copy constructor. It is not mandatory to always create an explicit copy constructor. The values are copied using implicit constructor only. 7. If a class implements some dynamic memory allocations and pointers then _____________ a) Copy constructor must be defined b) Copy constructor must not be defined c) Copy constructor can’t be defined d) Copy constructor will not be used
Answer: a Explanation: In the case where dynamic memory allocation is used, the copy constructor definition must be given. The implicit copy constructor is not capable of manipulating the dynamic memory and pointers. Explicit definition allows to manipulate the data as required. 8. What is the syntax of copy constructor? a) classname (classname &obj){ /*constructor definition*/ } b) classname (cont classname obj){ /*constructor definition*/ } c) classname (cont classname &obj){ /*constructor definition*/ } d) classname (cont &obj){ /*constructor definition*/ } Answer: c Explanation: The syntax must contain the class name first, followed by the classname as type and &object within parenthesis. Then comes the constructor body. The definition can be given as per requirements. 9. Object being passed to a copy constructor ___________ a) Must be passed by reference b) Must be passed by value c) Must be passed with integer type d) Must not be mentioned in parameter list Answer: a Explanation: This is mandatory to pass the object by reference. Otherwise, the object will try to create another object to copy its values, in turn a constructor will be called, and this will keep on calling itself. This will cause the compiler to give out of memory error. 10. Out of memory error is given when the object _____________ to the copy constructor. a) Is passed with & symbol b) Is passed by reference c) Is passed as <classname &obj> d) Is not passed by reference Answer: d Explanation: All the options given, directly or indirectly indicate that the object is being passed by reference. And if object is not passed by reference then the out of memory error is produced. Due to infinite constructor call of itself. 11. Copy constructor will be called whenever the compiler __________ a) Generates implicit code b) Generates member function calls c) Generates temporary object d) Generates object operations Answer: c Explanation: Whenever the compiler creates a temporary object, copy constructor is used to copy the values from existing object to the temporary object. 12. The deep copy is possible only with the help of __________ a) Implicit copy constructor b) User defined copy constructor c) Parameterized constructor d) Default constructor Answer: b Explanation: While using explicit copy constructor, the pointers of copied object point to the intended memory location. This is assured since the programmers themselves manipulate the addresses. 13. Can a copy constructor be made private? a) Yes, always b) Yes, if no other constructor is defined c) No, never d) No, private members can’t be accessed Answer: a Explanation: The copy constructor can be defined as private. If we make it private then the objects of the class can’t be copied. It can be used when a class used dynamic memory allocation. 14. The arguments to a copy constructor _____________ a) Must be const b) Must not be cosnt c) Must be integer type
d) Must be static Answer: a Explanation: The object should not be modified in the copy constructor. Because the object itself is being copied. When the object is returned from a function, the object must be a constant otherwise the compiler creates a temporary object which can die anytime. 15. Copy constructors are overloaded constructors. a) True b) False Answer: a Explanation: The copy constructors are always overloaded constructors. They have to be. All the classes have a default constructor and other constructors are basically overloaded constructors. To practice basic questions and answers on all areas of Object Oriented Programming using C++, .
This set of Object Oriented Programming using C++ Interview Questions and Answers for Experienced people focuses on “Passing Object to Functions”. 1. Passing object to a function _______________ a) Can be done only in one way b) Can be done in more than one ways c) Is not possible d) Is not possible in OOP Answer: b Explanation: The objects can be passed to the functions and this requires OOP concept because objects are main part of OOP. The objects can be passed in more than one way to a function. The passing depends on how the object have to be used. 2. The object ________________ a) Can be passed by reference b) Can be passed by value c) Can be passed by reference or value d) Can be passed with reference Answer: c Explanation: The objects can be passed by reference if required to use the same object. The values can be passed so that the main object remains same and no changes are made to it if the function makes any changes to the values being passed. 3. Which symbol should be used to pass the object by reference in C++? a) & b) @ c) $ d) $ or & Answer: a Explanation: The object to be passed by reference to the function should be preceded by & symbol in the argument list syntax of the function. This indicates the compiler not to use new object. The same object which is being passed have to be used. 4. If object is passed by value ______________ a) Copy constructor is used to copy the values into another object in the function b) Copy constructor is used to copy the values into temporary object c) Reference to the object is used to access the values of the object d) Reference to the object is used to created new object in its place Answer: a Explanation: The copy constructor is used. This constructor is used to copy the values into a new object which will contain all the values same as that of the object being passed but any changes made to the newly created object will not affect the original object. 5. Pass by reference of an object to a function _______________ a) Affects the object in called function only b) Affects the object in prototype only c) Affects the object in caller function d) Affects the object only if mentioned with & symbol with every call Answer: c Explanation: The original object in the caller function will get affected. The changes made in the called function will be same in the caller function object also. 6. Copy constructor definition requires __________________ a) Object to be passed by value b) Object not to be passed to it c) Object to be passed by reference d) Object to be passed with each data member value Answer: c Explanation: The object must be passed by reference to a copy constructor. This is to avoid the out of memory error. The constructors keeps calling itself, if not passed by reference, and goes out of memory. 7. What is the type of object that should be specified in the argument list? a) Function name b) Object name itself c) Caller function name d) Class name of object Answer: d
Explanation: The type of object is the class itself. The class name have to be specified in order to pass the objects to a function. This allows the program to create another object of same class or to use the same object that was passed. 8. If an object is passed by value, _________________ a) Temporary object is used in the function b) Local object in the function is used c) Only the data member values are used d) The values are accessible from the original object Answer: b Explanation: When an object is called by values, copy constructor is called and object is copied to the local object of the function which is mentioned in the argument list. The values gets copied and are used from the local object. There is no need to access the original object again. 9. Can data members be passed to a function using the object? a) Yes, it can be passed only inside class functions b) Yes, only if the data members are public and are being passed to a function outside the class c) No, can’t be passed outside the class d) No, can’t be done Answer: b Explanation: The data members can be passed with help of object but only if the member is public. The object will obviously be used outside the class. The object must have access to the data member so that its value or reference is used outside the class which is possible only if the member is public. 10. What exactly is passed when an object is passed by reference? a) The original object name b) The original object class name c) The exact address of the object in memory d) The exact address of data members Answer: c Explanation: The location of the object, that is, the exact memory location is passed, when the object is passed by reference. The pass by reference is actually a reference to the object that the function uses with another name to the same memory location as the original object uses. 11. If the object is not to be passed to any function but the values of the object have to be used then? a) The data members should be passed separately b) The data members and member functions have to be passed separately c) The values should be present in other variables d) The object must be passed Answer: a Explanation: The data members can be passed separately. There is no need to pass whole object, instead we can use the object to pass only the required values. 12. Which among the following is true? a) More than one object can’t be passed to a function b) Any number of objects can be passed to a function c) Objects can’t be passed, only data member values can be passed d) Objects should be passed only if those are public in class Answer: b Explanation: There is no restriction on passing the number of objects to a function. The operating system or the compiler or environment may limit the number of arguments. But there is no limit on number of objects till that limit. 13. What will be the output if all necessary code is included (Header files and main function)? void test (Object &y) { y = "It is a string"; } void main() { Object x = null; test (x); System.out.println (x); } a) Run time error
b) Compile time error c) Null d) It is a string Answer: d Explanation: This is because the x object is passed by reference. The changes made inside the function will be applicable to original function too. 14. In which type is new memory location will be allocated? a) Only in pass by reference b) Only in pass by value c) Both in pass by reference and value d) Depends on the code Answer: b Explanation: The new memory location will be allocated only if the object is passed by value. Reference uses the same memory address and is denoted by another name also. But in pass by value, another object is created and new memory space is allocated for it. 15. Pass by reference and pass by value can’t be done simultaneously in a single function argument list. a) True b) False Answer: b Explanation: There is no condition which specifies that only the reference pass or values pass is allowed. The argument list can contain one reference pass and another value pass. This helps to manipulate the objects with functions more easily.
This set of Object Oriented Programming using C++ Interview Questions and Answers for freshers focuses on “Overriding Member Functions”. 1. Which among the following best describes member function overriding? a) Member functions having same name in base and derived classes b) Member functions having same name in base class only c) Member functions having same name in derived class only d) Member functions having same name and different signature inside main function Answer: a Explanation: The member function which is defined in base class and again in the derived class, is overridden by the definition given in the derived class. This is because the preference is given more to the local members. When derived class object calls that function, definition from the derived class is used. 2. Which among the following is true? a) Inheritance must not be using when overriding is used b) Overriding can be implemented without using inheritance c) Inheritance must be done, to use overriding are overridden d) Inheritance is mandatory only if more than one functions Answer: c Explanation: The inheritance must be used in order to use function overriding. If inheritance is not used, the functions can only be overloaded. There must be a base class and a derived class to override the function of base class. 3. Which is the correct condition for function overriding? a) The declaration must not be same in base and derived class b) The declaration must be exactly the same in base and derived class c) The declaration should have at least 1 same argument in declaration of base and derived class d) The declaration should have at least 1 different argument in declaration of base and derived class Answer: b Explanation: For a function to be over ridden, the declaration must be exactly the same. There must not be any different syntax used. This will ensure that the function to be overridden is only the one intended from to be overridden from the derived class. 4. Exactly same declaration in base and derived class includes______________ a) Only same name b) Only same return type and name c) Only same return type and argument list d) All the same return type, name and parameter list Answer: d Explanation: Declaration includes the whole prototype of the function. The return type name and the parameter list must be same in order to confirm that the function is same in derived and the base class. And hence can be overridden. 5. Which among function will be overridden from the function defined in derived class below: class A { int i; void show() { cout<<i; } void print() { cout <<i; } }; class B { int j; void show() { cout<<j; } }; a) show() b) print() c) show() and print()
d) Compile time error Answer: a Explanation: The declaration must be exactly same in the derived class and base class. The derived class have defined show() function with exactly same declaration. This then shows that the function in base class is being overridden if show() is called from the object of class B. 6. How to access the overridden method of base class from the derived class? a) Using arrow operator b) Using dot operator c) Using scope resolution operator d) Can’t be accessed once overridden Answer: c Explanation: Scope resolution operator :: can be used to access the base class method even if overridden. To access those, first base class name should be written followed by the scope resolution operator and then the method name. 7. The functions to be overridden _____________ a) Must be private in base class b) Must not be private base class c) Must be private in both derived and base class d) Must not be private in both derived and base class Answer: b Explanation: If the function is private in the base class, derived class won’t be able to access it. When the derived class can’t access the function to be overridden then it won’t be able to override it with any definition. 8. Which language doesn’t support the method overriding implicitly? a) C++ b) C# c) Java d) SmallTalk Answer: b Explanation: The feature of method overriding is not provided in C#. To override the methods, one must use override or virtual keywords explicitly. This is done to remove accidental changes in program and unintentional overriding. 9. In C# ____________________ a) Non – virtual or static methods can’t be overridden b) Non – virtual and static methods only can be overridden c) Overriding is not allowed d) Overriding must be implemented using C++ code only Answer: a Explanation: The non-virtual and static methods can’t be overridden in C# language. The restriction is made from the language implicitly. Only the methods that are abstract, virtual or override can be overridden. 10. In Delphi ______________ a) Method overriding is done implicitly b) Method overriding is not supported c) Method overriding is done with directive override d) Method overriding is done with the directive virtually Answer: c Explanation: This is possible but only if the method to be overridden is marked as dynamic or virtual. It is inbuilt restriction of programming language. This is done to reduce the accidental or unintentional overriding. 11. What should be used to call the base class method from the derived class if function overriding is used in Java? a) Keyword super b) Scope resolution c) Dot operator d) Function name in parenthesis Answer: a Explanation: The keyword super must be used to access base class members. Even when overriding is used, super must be used with the dot operator. The overriding is possible. 12. In Kotlin, the function to be overridden must be ______________ a) Private b) Open c) Closed
d) Abstract Answer: b Explanation: The function to be overridden must be open. This is a condition in Kotlin for any function to be overridden. This avoids accidental overriding. 13. Abstract functions of a base class _________________ a) Are overridden by the definition in same class b) Are overridden by the definition in parent class c) Are not overridden generally d) Are overridden by the definition in derived class Answer: d Explanation: The functions declared to be abstract in base class are redefined in derived classes. That is, the functions are overridden by the definitions given in the derived classes. This must be done to give at least one definition to each undefined function. 14. If virtual functions are defined in the base class then _______________ a) It is not necessary for derived classes to override those functions b) It is necessary for derived classes to override those functions c) Those functions can never be derived d) Those functions must be overridden by all the derived classes Answer: a Explanation: The derived classes doesn’t have to redefine and override the base class functions. If one definition is already given it is not mandatory for any derived class to override those functions. The base class definition will be used. 15. Which feature of OOP is exhibited by the function overriding? a) Inheritance b) Abstraction c) Polymorphism d) Encapsulation Answer: c Explanation: The polymorphism feature is exhibited by function overriding. Polymorphism is the feature which basically defines that same named functions can have more than one functionalities.
This set of Object Oriented Programming using C++ Interview Questions and Answers focuses on “Passing and Returning Object with Functions”. 1. In how many ways can an object be passed to a function? a) 1 b) 2 c) 3 d) 4 Answer: c Explanation: The objects can be passed in three ways. Pass by value, pass by reference and pass by address. These are the general ways to pass the objects to a function. 2. If an object is passed by value _____________ a) A new copy of object is created implicitly b) The object itself is used c) Address of the object is passed d) A new object is created with new random values Answer: a Explanation: When an object is passed by value, a new object is created implicitly. This new object uses the implicit values assignment, same as that of the object being passed. 3. Pass by address passes the address of object _________ and pass by reference passes the address of the object _________ a) Explicitly, explicitly b) Implicitly, implicitly c) Explicitly, Implicitly d) Implicitly, explicitly Answer: c Explanation: Pass by address uses the explicit address passing to the function whereas pass by reference implicitly passes the address of the object. 4. If an object is passed by reference, the changes made in the function ___________ a) Are reflected to the main object of caller function too b) Are reflected only in local scope of the called function c) Are reflected to the copy of the object that is made during pass d) Are reflected to caller function object and called function object also Answer: a Explanation: When an object is passed by reference, its address is passed implicitly. This will make changes to the main function whenever any modification is done. 5. Constructor function is not called when an object is passed to a function, will its destructor be called when its copy is destroyed? a) Yes, depending on code b) Yes, must be called c) No, since no constructor was called d) No, since same object gets used Answer: b Explanation: Even though the constructor is not called when the object is passed to a function, the copy of the object is still created, where the values of the members are same. When the object have to be destroyed, the destructor is called to free the memory and resources that the object might have reserved. 6. When an object is returned by a function, a _______________ is automatically created to hold the return value. a) Temporary object b) Virtual object c) New object d) Data member Answer: a Explanation: The temporary object is created. It holds the return value. The values gets assigned as required, and the temporary object gets destroyed. 7. Is the destruction of temporary object safe (while returning object)? a) Yes, the resources get free to use b) Yes, other objects can use the memory space c) No, unexpected side effects may occur
d) No, always gives rise to exceptions Answer: c Explanation: The destruction of temporary variable may give rise to unexpected logical errors. Consider the destructor which may free the dynamically allocated memory. But this may abort the program if another is still trying to copy the values from that dynamic memory. 8. How to overcome the problem arising due to destruction of temporary object? a) Overloading insertion operator b) Overriding functions can be used c) Overloading parenthesis or returning object d) Overloading assignment operator and defining copy constructor Answer: d Explanation: The problem can be solved by overloading the assignment operator to get the values that might be getting returned while the destructor free the dynamic memory. Defining copy constructor can help us to do this in even simpler way. 9. How many objects can be returned at once? a) Only 1 b) Only 2 c) Only 16 d) As many as required Answer: a Explanation: Like any other value, only one object can be returned at ones. The only possible way to return more than one object is to return address of an object array. But that again comes under returning object pointer. 10. What will be the output of the following code? Class A { int i; public : A(int n) { i=n; cout<<”inside constructor ”; } ~A() { cout<<”destroying ”<<i; } void seti(int n) { i=n; } int geti() { return I; } }; void t(A ob) { cout<<”something ”; } int main() { A a(1); t(a); cout<<”this is i in main ”; cout<<a.geti(); } a) inside constructor something destroying 2this is i in main destroying 1 b) inside constructor something this is i in main destroying 1 c) inside constructor something destroying 2this is i in main d) something destroying 2this is i in main destroying 1 Answer: a Explanation: Although the object constructor is called only ones, the destructor will be called twice, because of destroying the copy of the object that is temporarily created. This is the concept of how the object should be passed and manipulated.
11. It is necessary to return the object if it was passed by reference to a function. a) Yes, since the object must be same in caller function b) Yes, since the caller function needs to reflect the changes c) No, the changes are made automatically d) No, the changes are made explicitly Answer: c Explanation: Having the address being passed to the function, the changes are automatically made to the main function. In all the cases if the address is being used, the same memory location will be updated with new values. 12. How many objects can be passed to a function simultaneously? a) Only 1 b) Only an array c) Only 1 or an array d) As many as required Answer: d Explanation: There is no limit to how many objects can be passed. This works in same way as that any other variable gets passed. Array and object can be passed at same time also. 13. If an object is passed by address, will be constructor be called? a) Yes, to allocate the memory b) Yes, to initialize the members c) No, values are copied d) No, temporary object is created Answer: c Explanation: A copy of all the values is created. If the constructor is called, there will be a compile time error or memory shortage. This happens because each time a constructor is called, it try to call itself again and that goes infinite times. 14. Is it possible that an object of is passed to a function, and the function also have an object of same name? a) No, Duplicate declaration is not allowed b) No, 2 objects will be created c) Yes, Scopes are different d) Yes, life span is different Answer: a Explanation: There can’t be more than one variable or object with the same name in same scope. The scope is same, since the object is passed, it becomes local to function and hence function can’t have one more object of same name. 15. Passing an object using copy constructor and pass by value are same. a) True b) False Answer: b Explanation: The copy constructor is used to copy the values from one object to other. Pass by values is not same as copy constructor method. Actually the pass by value method uses a copy constructor to copy the values in a local object.
This set of Object Oriented Programming using C++ Multiple Choice Questions & Answers focuses on “Private Member Functions”. 1. Which is private member functions access scope? a) Member functions which can only be used within the class b) Member functions which can used outside the class c) Member functions which are accessible in derived class d) Member functions which can’t be accessed inside the class Answer: a Explanation: The member functions can be accessed inside the class only if they are private. The access is scope is limited to ensure the security of the private members and their usage. 2. Which among the following is true? a) The private members can’t be accessed by public members of the class b) The private members can be accessed by public members of the class c) The private members can be accessed only by the private members of the class d) The private members can’t be accessed by the protected members of the class Answer: b Explanation: The private members are accessible within the class. There is no restriction on use of private members by public or protected members. All the members can access the private member functions of the class. 3. Which member can never be accessed by inherited classes? a) Private member function b) Public member function c) Protected member function d) All can be accessed Answer: a Explanation: The private member functions can never be accessed in the derived classes. The access specifiers is of maximum security that allows only the members of self class to access the private member functions. 4. Which syntax among the following shows that a member is private in a class? a) private: functionName(parameters) b) private(functionName(parameters)) c) private functionName(parameters) d) private::functionName(parameters) Answer: c Explanation: The function declaration must contain private keyword follower by the return type and function name. Private keyword is followed by normal function declaration. 5. If private member functions are to be declared in C++ then _____________ a) private: <all private members> b) private <member name> c) private(private member list) d) private :- <private members> Answer: a Explanation: The private members doesn’t have to have the keyword with each private member. We only have to specify the keyword private followed by single colon and then private member’s are listed. 6. In java, which rule must be followed? a) Keyword private preceding list of private member’s b) Keyword private with a colon before list of private member’s c) Keyword private with arrow before each private member d) Keyword private preceding each private member Answer: d Explanation: The private keyword must be mentioned before each private member. Unlike the rule in C++ to specify private once and list all other private member’s, in java all member declarations must be preceded by the keyword private. 7. How many private member functions are allowed in a class? a) Only 1 b) Only 7 c) Only 255 d) As many as required Answer: d Explanation: There are no conditions applied on the number of private member functions that can be declared in a class. Though
the system may restrict use of too many functions depending on memory. 8. How to access a private member function of a class? a) Using object of class b) Using object pointer c) Using address of member function d) Using class address Answer: c Explanation: Even the private member functions can be called outside the class. This is possible if address of the function is known. We can use the address to call the function outside the class. 9. Private member functions ____________ a) Can’t be called from enclosing class b) Can be accessed from enclosing class c) Can be accessed only if nested class is private d) Can be accessed only if nested class is public Answer: a Explanation: The nested class members can’t be accessed in the enclosed class even though other members can be accessed. This is to ensure the class members security and not to go against the rules of private members. 10. Which function among the following can’t be accessed outside the class in java in same package? a) public void show() b) void show() c) protected show() d) static void show() Answer: c Explanation: The protected members are available within the class. And are also available in derived classes. But these members are treated as private members for outside the class and inheritance structure. Hence can’t be accessed. 11. If private members are to be called outside the class, which is a good alternative? a) Call a public member function which calls private function b) Call a private member function which calls private function c) Call a protected member function which calls private function d) Not possible Answer: a Explanation: The private member functions can be accessed within the class. A public member function can be called which in turn calls the private member function. This maintains the security and adheres to the rules of private members. 12. A private function of a derived class can be accessed by the parent class. a) True b) False Answer: b Explanation: If private functions get accessed even by the parent class that will violate the rules of private members. If the functions can be accessed then the derived class security is hindered. 13. Which error will be produced if private members are accessed? a) Can’t access private message b) Code unreachable c) Core dumped d) Bad code Answer: a Explanation: The private members access from outside the class produce an error. The error states that the code at some line can’t access the private members. And denies the access terminating the program. 14. Can main() function be made private? a) Yes, always b) Yes, if program doesn’t contain any classes c) No, because main function is user defined d) No, never Answer: d Explanation: The reason given in option “No, because main function is user defined” is wrong. The proper reason that the main function should not be private is that it should be accessible in whole program. This makes the program flexible. 15. If a function in java is declared private then it __________________
a) Can’t access the standard output b) Can access the standard output c) Can’t access any output stream d) Can access only the output streams Answer: b Explanation: The private members can access any standard input or output. There is no restriction on access to any input or output stream. And since standard input can also be used hence only accessing the output stream is not true. To practice all areas of Object Oriented Programming using C++, .
This set of Object Oriented Programming using C++ Problems focuses on “Types of Member Functions”. 1. How many types of member functions are possible in general? a) 2 b) 3 c) 4 d) 5 Answer: d Explanation: There are basically 5 types of member functions possible. The types include simple, static, const, inline, and friend member functions. Any of these types can be used in a program as per requirements. 2. Simple member functions are ______________________ a) Ones defined simply without any type b) Ones defined with keyword simple c) Ones that are implicitly provided d) Ones which are defined in all the classes Answer: a Explanation: When there is no type defined for any function and just a simple syntax is used with the return type, function name and parameter list then those are known as simple member functions. This is a general definition of simple members. 3. What are static member functions? a) Functions which use only static data member but can’t be accessed directly b) Functions which uses static and other data members c) Functions which can be accessed outside the class with the data members d) Functions using only static data and can be accessed directly in main() function Answer: d Explanation: The static member functions can be accessed directly in the main function. There is no restriction on direct use. We can call them with use of objects also. But the restriction is that the static member functions can only use the static data members of the class. 4. How can static member function can be accessed directly in main() function? a) Dot operator b) Colon c) Scope resolution operator d) Arrow operator Answer: c Explanation: The static member functions can be accessed directly in the main() function. The only restriction is that those must use only static data members of the class. These functions are property of class rather than each object. 5. Correct syntax to access the static member functions from the main() function is ______________ a) classObject::functionName(); b) className::functionName(); c) className:classObject:functionName(); d) className.classObject:functionName(); Answer: b Explanation: The syntax in option b must be followed in order to call the static functions directly from the main() function. That is a predefined syntax. Scope resolution helps to spot the correct function in the correct class. 6. What are const member functions? a) Functions in which none of the data members can be changed in a program b) Functions in which only static members can be changed c) Functions which treat all the data members as constant and doesn’t allow changes d) Functions which can change only the static members Answer: c Explanation: The const member functions are intended to keep the value of all the data members of a class same and doesn’t allow any changes on them. The data members are treated as constant data and any modification inside the const function is restricted. 7. Which among the following best describes the inline member functions? a) Functions defined inside the class only b) Functions with keyword inline only c) Functions defined outside the class d) Functions defined inside the class or with the keyword inline Answer: d
Explanation: The functions which are defined with the keyword inline or are defined inside the class are treated to be inline functions. Definitions inside the class are implicitly made inline if none of the complex statements are used in the definition. 8. What are friend member functions (C++)? a) Member function which can access all the members of a class b) Member function which can modify any data of a class c) Member function which doesn’t have access to private members d) Non-member functions which have access to all the members (including private) of a class Answer: d Explanation: A non-member function of a class which can access even the private data of a class is a friend function. It is an exception on access to private members outside the class. It is sometimes considered as a member functions since it has all the access that a member function in general have. 9. What is the syntax of a const member function? a) void fun() const {} b) void fun() constant {} c) void const fun() {} d) const void fun(){} Answer: a Explanation: The general syntax to be followed in order to declare a const function in a class is as in option a. The syntax may vary in different programming languages. 10. Which keyword is used to make a nonmember function as friend function of a class? a) friendly b) new c) friend d) connect Answer: c Explanation: The keyword friend is provided in programming languages to use it whenever a functions is to be made friend of one class or other. The keyword indicates that the function is capable of new functionalities like accessing private members. 11. Member functions _____________________ a) Must be defined inside class body b) Can be defined inside class body or outside c) Must be defined outside the class body d) Can be defined in another class Answer: c Explanation: The functions definitions can be given inside or outside the body of class. If defined inside, general syntax is used. If defined outside then the class name followed by scope resolution operator and then function name must be given for the definition. 12. All type of member functions can’t be used inside a single class. a) True b) False Answer: b Explanation: There is no restriction on the use of type of member functions inside a single class. Any type any number of times can be defined inside a class. The member functions can be used as required. 13. Which among the following is true? a) Member functions can never be private b) Member functions can never be protected c) Member functions can never be public d) Member functions can be defined in any access specifier Answer: d Explanation: The member functions can be defined inside any specifier. There is no restriction. The programmer can apply restrictions on its use by specifying the access specifier with the functions. 14. Which keyword is used to define the static member functions? a) static b) stop c) open d) state Answer: a Explanation: The static keyword is used to declare any static member function in a class. The static members become common
to each object of the class being created. They share the same values. 15. Which keyword is used to define the inline member function? a) no keyword required b) inline c) inlined d) line Answer: b Explanation: The inline keyword is used to defined the inline member functions in a class. The functions are implicitly made inline if defined inside the class body, but only if they doesn’t have any complex statement inside. All functions defined outside the class body must be mentioned with an explicit inline keyword.
This set of Object Oriented Programming (OOPs) using C++ Multiple Choice Questions & Answers (MCQs) focuses on “Abstract Class”. 1. Which among the following best describes abstract classes? a) If a class has more than one virtual function, it’s abstract class b) If a class have only one pure virtual function, it’s abstract class c) If a class has at least one pure virtual function, it’s abstract class d) If a class has all the pure virtual functions only, then it’s abstract class Answer: c Explanation: The condition for a class to be called abstract class is that it must have at least one pure virtual function. The keyword abstract must be used while defining abstract class in java. 2. Can abstract class have main() function defined inside it? a) Yes, depending on return type of main() b) Yes, always c) No, main must not be defined inside abstract class d) No, because main() is not abstract function Answer: b Explanation: This is a property of abstract class. It can define main() function inside it. There is no restriction on its definition and implementation. 3. If there is an abstract method in a class then, ________________ a) Class must be abstract class b) Class may or may not be abstract class c) Class is generic d) Class must be public Answer: a Explanation: It is a rule that if a class have even one abstract method, it must be an abstract class. If this rule was not made, the abstract methods would have got skipped to get defined in some places which are undesirable with the idea of abstract class. 4. If a class is extending/inheriting another abstract class having abstract method, then _______________________ a) Either implementation of method or making class abstract is mandatory b) Implementation of the method in derived class is mandatory c) Making the derived class also abstract is mandatory d) It’s not mandatory to implement the abstract method of parent class Answer: a Explanation: Either of the two things must be done, either implementation or declaration of class as abstract. This is done to ensure that the method intended to be defined by other classes gets defined at every possible class. 5. Abstract class A has 4 virtual functions. Abstract class B defines only 2 of those member functions as it extends class A. Class C extends class B and implements the other two member functions of class A. Choose the correct option below. a) Program won’t run as all the methods are not defined by B b) Program won’t run as C is not inheriting A directly c) Program won’t run as multiple inheritance is used d) Program runs correctly Answer: d Explanation: The program runs correctly. This is because even class B is abstract so it’s not mandatory to define all the virtual functions. Class C is not abstract but all the virtual functions have been implemented will that class. 6. Abstract classes can ____________________ instances. a) Never have b) Always have c) Have array of d) Have pointer of Answer: a Explanation: When an abstract class is defined, it won’t be having the implementation of at least one function. This will restrict the class to have any constructor. When the class doesn’t have constructor, there won’t be any instance of that class. 7. We ___________________ to an abstract class. a) Can create pointers b) Can create references c) Can create pointers or references d) Can’t create any reference, pointer or instance Answer: c
Explanation: Even though there can’t be any instance of abstract class. We can always create pointer or reference to abstract class. The member functions which have some implementation inside abstract itself can be used with these references. 8. Which among the following is an important use of abstract classes? a) Header files b) Class Libraries c) Class definitions d) Class inheritance Answer: b Explanation: The abstract classes can be used to create a generic, extensible class library that can be used by other programmers. This helps us to get some already implemented codes and functions that might have not been provided by the programming language itself. 9. Use of pointers or reference to an abstract class gives rise to which among the following feature? a) Static Polymorphism b) Runtime polymorphism c) Compile time Polymorphism d) Polymorphism within methods Answer: b Explanation: The runtime polymorphism is supported by reference and pointer to an abstract class. This relies upon base class pointer and reference to select the proper virtual function. 10. The abstract classes in java can _________________ a) Implement constructors b) Can’t implement constructor c) Can implement only unimplemented methods d) Can’t implement any type of constructor Answer: a Explanation: The abstract classes in java can define a constructor. Even though instance can’t be created. But in this way, only during constructor chaining, constructor can be called. When instance of concrete implementation class is created, it’s known as constructor chaining. 11. Abstract class can’t be final in java. a) True b) False Answer: a Explanation: If an abstract class is made final in java, it will stop the abstract class from being extended. And if the class is not getting extended, there won’t be another class to implement the virtual functions. Due to this contradicting fact, it can’t be final in java. 12. Can abstract classes have static methods (Java)? a) Yes, always b) Yes, but depends on code c) No, never d) No, static members can’t have different values Answer: a Explanation: There is no restriction on declaring static methods. The only condition is that the virtual functions must have some definition in the program. 13. It is _________________________ to have an abstract method. a) Not mandatory for an static class b) Not mandatory for a derived class c) Not mandatory for an abstract class d) Not mandatory for parent class Answer: c Explanation: Derived, parent and static classes can’t have abstract method (We can’t say what type of these classes is). And for abstract class it’s not mandatory to have abstract method. But if any abstract method is there inside a class, then class must be abstract type. 14. How many abstract classes can a single program contain? a) At most 1 b) At least 1 c) At most 127 d) As many as required
Answer: d Explanation: There is no restriction on the number of abstract classes that can be defined inside a single program. The programs can use as many abstract classes as required. But the functions with no body must be implemented. 15. Is it necessary that all the abstract methods must be defined from an abstract class? a) Yes, depending on code b) Yes, always c) No, never d) No, if function is not used, no definition is required Answer: b Explanation: That is the rule of programming language that each function declared, must have some definition. There can’t be some abstract method that remains undefined. Even if it’s there, it would result in compile time error.
This set of Object Oriented Programming (OOPs) using C++ Multiple Choice Questions & Answers (MCQs) focuses on ” Abstract Function”. 1. Which among the following best defines the abstract methods? a) Functions declared and defined in base class b) Functions only declared in base class c) Function which may or may not be defined in base class d) Function which must be declared in derived class Answer: b Explanation: The abstract functions must only be declared in base class. Their definitions are provided by the derived classes. It is a mandatory condition. 2. Which among the following is true? a) The abstract functions must be only declared in derived classes b) The abstract functions must not be defined in derived classes c) The abstract functions must be defined in base and derived class d) The abstract functions must be defined either in base or derived class Answer: a Explanation: The abstract functions can’t be defined in base class. They are to be defined in derived classes. It is a rule for abstract functions. 3. How are abstract functions different from the abstract functions? a) Abstract must not be defined in base class whereas virtual function can be defined b) Either of those must be defined in base class c) Different according to definition d) Abstract functions are faster Answer: a Explanation: The abstract functions are only declared in base class. Derived classes have to implement those functions in order to inherit that base class. The functions are always defined in derived classes only. 4. Which among the following is correct? a) Abstract functions should not be defined in all the derived classes b) Abstract functions should be defined only in one derived class c) Abstract functions must be defined in base class d) Abstract functions must be defined in all the derived classes Answer: d Explanation: The abstract function are only declared in base classes and then has to be defined in all the derived classes. This allows all the derived classes to define own definition of any function whose declaration in base class might be common to all the other derived classes. 5. It is ____________________ to define the abstract functions. a) Mandatory for all the classes in program b) Necessary for all the base classes c) Necessary for all the derived classes d) Not mandatory for all the derived classes Answer: c Explanation: The derived classes must define the abstract function of base class in their own body. This is a necessary condition. Because the abstract functions doesn’t contain any definition in base class and hence becomes mandatory for the derived class to define them. All the functions in a program must have some definition. 6. The abstract function definitions in derived classes is enforced at _________ a) Runtime b) Compile time c) Writing code time d) Interpreting time Answer: b Explanation: When the program is compiled, these definitions are checked if properly defined. This compiler also ensure that the function is being defined by all the derived classes. Hence we get a compile time error if not done. 7. What is this feature of enforcing definitions of abstract function at compile time called? a) Static polymorphism b) Polymorphism c) Dynamic polymorphism d) Static or dynamic according to need
Answer: c Explanation: The feature is known as Dynamic polymorphism. Because the definitions are resolved at runtime. Even though the definitions are checked at compile time, they are resolved at runtime only. 8. What is the syntax for using abstract method? a) <access-modifier>abstract<return-type>method_name (parameter) b) abs<return-type>method name (parameter) c) <access-modifier>abstract return-type method name (parameter) d) <access-modifier>abstract <returning> method name (parameter) Answer: a Explanation: The syntax must firstly contain the access modifier. Then the keyword abstract is written to mention clearly to the compiler that it is an abstract method. Then prototype of the function with return type, function name and parameters. 9. If a function declared as abstract in base class doesn’t have to be defined in derived class then ______ a) Derived class must define the function anyhow b) Derived class should be made abstract class c) Derived class should not derive from that base class d) Derived class should not use that function Answer: b Explanation: If the function that is not to be defined in derived class but is declared as abstract in base class then the derived class must be made an abstract class. This will make the concept mandatory that the derived class must have one subclass to define that method. 10. Static methods can’t be made abstract in java. a) True b) False Answer: a Explanation: The abstract functions can’t be made static in a program. If those are made static then the function will be a property of class rather than each object. In turn ever object or derived class must use the common definition given in the base class. But abstract functions can’t be defined in the base class. Hence not possible. 11. Which among the following is true? a) Abstract methods can be static b) Abstract methods can be defined in derived class c) Abstract methods must not be static d) Abstract methods can be made static in derived class Answer: c Explanation: The abstract methods can never be made static. Even if it is in derived class, it can’t be made static. If this happens, then all the subsequent sub classes will have a common definition of abstract function which is not desirable. 12. Which among the following is correct for abstract methods? a) It must have different prototype in the derived class b) It must have same prototype in both base and derived class c) It must have different signature in derived class d) It must have same return type only Answer: b Explanation: The prototype must be the same. This is to override the function declared as abstract in base class. Or else it will not be possible to override the abstract function of base class and hence we get a compile time error. 13. If a class have all the abstract methods the class will be known as ___________ a) Abstract class b) Anonymous class c) Base class d) Derived class Answer: a Explanation: The classes containing all the abstract methods are known as abstract classes. And the abstract classes can never have any normal function with definition. Hence known as abstract class. 14. The abstract methods can never be ___________ in a base class. a) Private b) Protected c) Public d) Default Answer: a
Explanation: The base class must not contain the abstract methods. The methods have to be derived and defined in derived class. But if it is made private it can’t be inherited. Hence we can’t declare it as a private member. 15. The abstract method definition can be made ___________ in derived class. a) Private b) Protected c) Public d) Private, public, or protected Answer: d Explanation: The derived class implements the definition of the abstract methods of base class. Those can be made private in derived class if security is needed. There won’t be any problem in declaring it as private.
This set of Object Oriented Programming (OOPs) using C++ Multiple Choice Questions & Answers (MCQs) focuses on “Abstraction”. 1. Which among the following best defines abstraction? a) Hiding the implementation b) Showing the important data c) Hiding the important data d) Hiding the implementation and showing only the features Answer: d Explanation: It includes hiding the implementation part and showing only the required data and features to the user. It is done to hide the implementation complexity and details from the user. And to provide a good interface in programming. 2. Hiding the implementation complexity can ____________ a) Make the programming easy b) Make the programming complex c) Provide more number of features d) Provide better features Answer: a Explanation: It can make programming easy. The programming need not know how the inbuilt functions are working but can use those complex functions directly in the program. It doesn’t provide more number of features or better features. 3. Class is _________ abstraction. a) Object b) Logical c) Real d) Hypothetical Answer: b Explanation: Class is logical abstraction because it provides a logical structure for all of its objects. It gives an overview of the features of an object. 4. Object is ________ abstraction. a) Object b) Logical c) Real d) Hypothetical Answer: c Explanation: Object is real abstraction because it actually contains those features of class. It is the implementation of overview given by class. Hence the class is logical abstraction and its object is real. 5. Abstraction gives higher degree of ________ a) Class usage b) Program complexity c) Idealized interface d) Unstable interface Answer: c Explanation: It is to idealize the interface. In this way the programmer can use the programming features more efficiently and can code better. It can’t increase the program complexity, as the feature itself is made to hide it. 6. Abstraction can apply to ____________ a) Control and data b) Only data c) Only control d) Classes Answer: a Explanation: Abstraction applies to both. Control abstraction involves use of subroutines and control flow abstraction. Data abstraction involves handling pieces of data in meaningful ways. 7. Which among the following can be viewed as combination of abstraction of data and code. a) Class b) Object c) Inheritance d) Interfaces Answer: b Explanation: Object can be viewed as abstraction of data and code. It uses data members and their functioning as data
1. Which of the following is not an example of Social Media?
- Twitter
- Google
- Instagram
- YouTube
Answer
Google
2. By 2025, the volume of digital data will increase to
- TB
- YB
- ZB
- EB
Answer
ZB
3. Data Analysis is a process of
- inspecting data
- cleaning data
- transforming data
- All of Above
Answer
All of above
4. Does Facebook uses “Big Data ” to perform the concept of Flashback?
- True
- False
Answer
True
5. Which of the following is not a major data analysis approaches?
- Data Mining
- Predictive Intelligence
- Business Intelligence
- Text Analytics
Answer
Predictive Intelligence
6. The Process of describing the data that is huge and complex to store and process is known as
- Analytics
- Data mining
- Big data
- Data warehouse
Answer
Big data
7. How many main statistical methodologies are used in data analysis?
- 2
- 3
- 4
- 5
Answer
2
8. In descriptive statistics, data from the entire population or a sample is summarized with ?
- Integer descriptor
- floating descriptor
- numerical descriptor
- decimal descriptor
Answer
numerical descriptor
9. ____ have a structure but cannot be stored in a database.
- Structured
- Semi Structured
- Unstructured
- None of these
Answer
None of these
10. Data generated from online transactions is one of the example for volume of big data
- TRUE
- FALSE
Answer
TRUE
11. Velocity is the speed at which the data is processed
- True
- False
Answer
False
12. Value tells the trustworthiness of data in terms of quality and accuracy
- TRUE
- FALSE
Answer
False
13. Hortonworks was introduced by Cloudera and owned by Yahoo
- True
- False
Answer
False
14. ____ refers to the ability to turn your data useful for business
- Velocity
- variety
- Value
- Volume
Answer
Value
15. GFS consists of ____ Master and _____ Chunk Servers
- Single, Single
- Multiple, Single
- Single, Multiple
- Multiple, Multiple
Answer
Single, Multiple
16. Data Analysis is defined by the statistician?
- William S.
- Hans Peter Luhn
- Gregory Piatetsky-Shapiro
- John Tukey
Answer
John Tukey
17. Files are divided into ____ sized Chunks.
- Static
- Dynamic
- Fixed
- Variable
Answer
Fixed
18. _____ is an open source framework for storing data and running application on clusters of commodity hardware.
- HDFS
- Hadoop
- MapReduce
- Cloud
Answer
Hadoop
19. HDFS Stores how much data in each clusters that can be scaled at any time?
- 32
- 64
- 128
- 256
Answer
128
20. Hadoop Map Reduce allows you to perform distributed parallel processing on large volumes of data quickly and efficiently… is this Map Reduce or Hadoop
- True
- False
Answer
True
21. Google Introduced Map Reduce Programming model in 2004
- True
- False
Answer
True
22. Hadoop YARN is used for Cluster Resource Management in Hadoop Ecosystem
- True
- False
Answer
True
23. _____ phase sorts the data & _____ creates logical clusters.
- Reduce, YARN
- MAP, YARN
- REDUCE, MAP
- MAP, REDUCE
Answer
MAP, REDUCE
24. There is only one operation between Mapping and Reducing
- True
- False
Answer
True
25. Which of the following is true about hypothesis testing?
- answering yes/no questions about the data
- estimating numerical characteristics of the data
- describing associations within the data
- modeling relationships within the data
Answer
answering yes/no questions about the data
26. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities
- True
- False
Answer
True
27. ____ is factors considered before Adopting Big Data Technology
- Validation
- Verification
- Data
- Design
Answer
Validation
28. for improving supply chain management to optimize stock management, replenishment, and forecasting
- Descriptive
- Diagnostic
- Predictive
- Prescriptive
Answer
Predictive
29. which among the following is not a Data mining and analytical applications?
- profile matching
- social network analysis
- facial recognition
- Filtering
Answer
Filtering
30. _____ as a result of data accessibility, data latency, data availability, or limits on bandwidth in relation to the size of inputs
- Computation-restricted throttling
- Large data volumes
- Data throttling
- Data Parallelization
Answer
Data throttling
31. As an example, an expectation of using a recommendation engine would be to increase same-customer sales by adding more items into the market basket
- Lowering costs
- Increasing revenues
- Increasing productivity
- Reducing risk
Answer
Increasing revenues
32. Which storage subsystem can support massive data volumes of increasing size.
- Extensibility
- Fault tolerance
- Scalability
- High-speed I/O capacity
Answer
Scalability
33. _____ provides performance through distribution of data and fault tolerance through replication
- HDFS
- PIG
- HIVE
- HADOOP
Answer
HDFS
34. ______ is a programming model for writing applications that can process Big Data in parallel on multiple nodes.
- HDFS
- MAP REDUCE
- HADOOP
- HIVE
Answer
MAP REDUCE
35. ____ takes the grouped key-value paired data as input and runs a Reducer function on each one of them.
- MAPPER
- REDUCER
- COMBINER
- PARTITIONER
Answer
REDUCER
36. ____ is a type of local Reducer that groups similar data from the map phase into identifiable sets.
- MAPPER
- REDUCER
- COMBINER
- PARTITIONER
Answer
COMBINER
37. While Installing Hadoop how many xml files are edited and list them ?
- core-site.xml
- hdfs-site.xml
- mapred.xml
- yarn.xml
Answer
core-site.xml
38. Movie Recommendation systems are an example of
- Classification
- Clustering
- Reinforcement Learning
- Regression
- 2 only
- 1 and 3
- 1 and 2
- 2 and 3
Answer
1 and 3
39. Sentiment Analysis is an example of
- Regression
- Classification
- clustering
- Reinforcement Learning
- 1, 2 and 4
- 1, 2 and 3
- 1 and 3
- 1 and 2
Answer
1, 2 and 4
- industry statistics
- economic statistics
- applied statistics
- applied statistics
Show Answer
applied statistics
2. Which of the following is true about regression analysis?
- answering yes/no questions about the data
- estimating numerical characteristics of the data
- modeling relationships within the data
- describing associations within the data
Show Answer
modeling relationships within the data
3. Text Analytics, also referred to as Text Mining?
- True
- False
- Can be true or False
- Can not say
Show Answer
TRUE
4. What is a hypothesis?
- A statement that the researcher wants to test through the data collected in a study.
- A research question the results will answer.
- A theory that underpins the study.
- A statistical method for calculating the extent to which the results could have happened by chance.
Show Answer
A statement that the researcher wants to test through the data collected in a study.
5. What is the cyclical process of collecting and analyzing data during a single research study called?
- Interim Analysis
- Inter analysis
- inter item analysis
- constant analysis
Show Answer
Interim Analysis
6. The process of quantifying data is referred to as ____
- Topology
- Digramming
- Enumeration
- coding
Show Answer
Enumeration
7. An advantage of using computer programs for qualitative data is that they _
- Can reduce time required to analyse data (i.e., after the data are transcribed)
- Help in storing and organising data
- Make many procedures available that are rarely done by hand due to time constraints
- All of the above
Show Answer
All of the Above
8. Boolean operators are words that are used to create logical combinations.
- True
- False
Show Answer
True
9. ______ are the basic building blocks of qualitative data.
- Categories
- Units
- Individuals
- None of the above
Show Answer
Categories
10. This is the process of transforming qualitative research data from written interviews or field notes into typed text.
- Segmenting
- Coding
- Transcription
- Mnemoning
Show Answer
Transcription
11. A challenge of qualitative data analysis is that it often includes data that are unwieldy and complex; it is a major challenge to make sense of the large pool of data.
- True
- False
Show Answer
True
12. Hypothesis testing and estimation are both types of descriptive statistics.
- True
- False
Show Answer
False
13. A set of data organised in a participants(rows)-by-variables(columns) format is known as a “data set.”
- True
- False
Show Answer
True
14. A graph that uses vertical bars to represent data is called a ___
- Line graph
- Bar graph
- Scatterplot
- Vertical graph
Show Answer
Bar graph
15. ____ are used when you want to visually examine the relationship between two quantitative variables.
- Bar graph
- pie graph
- line graph
- Scatterplot
Show Answer
Scatterplot
16. The denominator (bottom) of the z-score formula is
- The standard deviation
- The difference between a score and the mean
- The range
- The mean
Show Answer
The standard deviation
17. Which of these distributions is used for a testing hypothesis?
- Normal Distribution
- Chi-Squared Distribution
- Gamma Distribution
- Poisson Distribution
Show Answer
Chi-Squared Distribution
18. A statement made about a population for testing purpose is called?
- Statistic
- Hypothesis
- Level of Significance
- Test-Statistic
Show Answer
Hypothesis
19. If the assumed hypothesis is tested for rejection considering it to be true is called?
- Null Hypothesis
- Statistical Hypothesis
- Simple Hypothesis
- Composite Hypothesis
Show Answer
Null Hypothesis
20. If the null hypothesis is false then which of the following is accepted?
- Null Hypothesis
- Positive Hypothesis
- Negative Hypothesis
- Alternative Hypothesis.
Show Answer
Alternative Hypothesis.
21. Alternative Hypothesis is also called as?
- Composite hypothesis
- Research Hypothesis
- Simple Hypothesis
- Null Hypothesis
Show Answer
Research Hypothesis
- 0
- 1
- 2
- 3
Answer
1
2. For two runs of K-Mean clustering is it expected to get same clustering results?
- Yes
- No
Answer
No
3. Which of the following algorithm is most sensitive to outliers?
- K-means clustering algorithm
- K-medians clustering algorithm
- K-modes clustering algorithm
- K-medoids clustering algorithm
Answer
K-means clustering algorithm
4. The discrete variables and continuous variables are two types of
- Open end classification
- Time series classification
- Qualitative classification
- Quantitative classification
Answer
Quantitative classification
5. Bayesian classifiers is
- A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.
- Any mechanism employed by a learning system to constrain the search space of a hypothesis
- An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation.
- None of these
Answer
A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.
6. Classification accuracy is
- A subdivision of a set of examples into a number of classes
- Measure of the accuracy, of the classification of a concept that is given by a certain theory
- The task of assigning a classification to a set of examples
- None of these
Answer
Measure of the accuracy, of the classification of a concept that is given by a certain theory
7. Euclidean distance measure is
- A stage of the KDD process in which new data is added to the existing selection.
- The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them
- The distance between two points as calculated using the Pythagoras theorem
- none of above
Answer
The distance between two points as calculated using the Pythagoras theorem
8. Hybrid is
- Combining different types of method or information
- Approach to the design of learning algorithms that is structured along the lines of the theory of evolution.
- Decision support systems that contain an information base filled with the knowledge of an expert formulated in terms of if-then rules.
- none of above
Answer
Combining different types of method or information
9. Decision trees use ______ , in that they always choose the option that seems the best available at that moment.
- Greedy Algorithms
- divide and conquer
- Backtracking
- Shortest path algorithm
Answer
Greedy Algorithms
10. Discovery is
- It is hidden within a database and can only be recovered if one is given certain clues (an example IS encrypted information).
- The process of executing implicit previously unknown and potentially useful information from data
- An extremely complex molecule that occurs in human chromosomes and that carries genetic information in the form of genes.
- None of these
Answer
The process of executing implicit previously unknown and potentially useful information from data
Data Analytics sppu mcq
11. Hidden knowledge referred to
- A set of databases from different vendors, possibly using different database paradigms
- An approach to a problem that is not guaranteed to work but performs well in most cases
- Information that is hidden in a database and that cannot be recovered by a simple SQL query.
- None of these
Answer
Information that is hidden in a database and that cannot be recovered by a simple SQL query.
12. Decision trees cannot handle categorical attributes with many distinct values, such as country codes for telephone numbers.
- True
- False
Answer
False
13. Enrichment is
- A stage of the KDD process in which new data is added to the existing selection
- The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them
- The distance between two points as calculated using the Pythagoras theorem.
- None of these
Answer
A stage of the KDD process in which new data is added to the existing selection
14. _____ are easy to implement and can execute efficiently even without prior knowledge of the data, they are among the most popular algorithms for classifying text documents.
- ID3
- Naïve Bayes classifiers
- CART
- None of above
Answer
Naïve Bayes classifiers
15. High entropy means that the partitions in classification are
- Pure
- Not Pure
- Usefull
- useless
Answer
Uses a single processor or computer
16. Which of the following statements about Naive Bayes is incorrect?
- Attributes are equally important.
- Attributes are statistically dependent of one another given the class value.
- Attributes are statistically independent of one another given the class value.
- Attributes can be nominal or numeric
Answer
Attributes are statistically dependent of one another given the class value.
17. The maximum value for entropy depends on the number of classes so if we have 8 Classes what will be the max entropy.
- Max Entropy is 1
- Max Entropy is 2
- Max Entropy is 3
- Max Entropy is 4
Answer
Max Entropy is 3
18. Point out the wrong statement.
- k-nearest neighbor is same as k-means
- k-means clustering is a method of vector quantization
- k-means clustering aims to partition n observations into k clusters
- none of the mentioned
Answer
k-nearest neighbor is same as k-means
19. Consider the following example “How we can divide set of articles such that those articles have the same theme (we do not know the theme of the articles ahead of time) ” is this:
- Clustering
- Classification
- Regression
- None of these
Answer
Clustering
data analytics mcqs with answers
20. Can we use K Mean Clustering to identify the objects in video?
- Yes
- No
Answer
Yes
21. Clustering techniques are ______ in the sense that the data scientist does not determine, in advance, the labels to apply to the clusters.
- Unsupervised
- supervised
- Reinforcement
- Neural network
Answer
Unsupervised
22. _____ metric is examined to determine a reasonably optimal value of k.
- Mean Square Error
- Within Sum of Squares (WSS)
- Speed
- None of these
Answer
Within Sum of Squares (WSS)
23. If an itemset is considered frequent, then any subset of the frequent itemset must also be frequent.
- Apriori Property
- Downward Closure Property
- Either 1 or 2
- Both 1 and 2
Answer
Both 1 and 2
24. if {bread,eggs,milk} has a support of 0.15 and {bread,eggs} also has a support of 0.15, the confidence of rule {bread,eggs}→{milk} is
- 0
- 1
- 2
- 3
Answer
1
25. Confidence is a measure of how X and Y are really related rather than coincidentally happening together.
- True
- False
Answer
False
26. ______ recommend items based on similarity measures between users and/or items.
- Content Based Systems
- Hybrid System
- Collaborative Filtering Systems
- None of these
Answer
Collaborative Filtering Systems
27. There are ______ major Classification of Collaborative Filtering Mechanisms
- 1
- 2
- 3
- none of above
Answer
2
28. Movie Recommendation to people is an example of
- User Based Recommendation
- Item Based Recommendation
- Knowledge Based Recommendation
- content based recommendation
Answer
Item Based Recommendation
29. _____ recommenders rely on an explicitely defined set of recommendation rules
- Constraint Based
- Case Based
- Content Based
- User Based
Answer
Case Based
30. Parallelized hybrid recommender systems operate dependently of one another and produce separate recommendation lists.
- True
- False
Answer
False
- industry statistics
- economic statistics
- applied statistics
- applied statistics
Answer
applied statistics
2. Which of the following is true about regression analysis?
- answering yes/no questions about the data
- estimating numerical characteristics of the data
- modeling relationships within the data
- describing associations within the data
Answer
modeling relationships within the data
3. Text Analytics, also referred to as Text Mining?
- True
- False
- Can be true or False
- Can not say
Answer
TRUE
4. What is a hypothesis?
- A statement that the researcher wants to test through the data collected in a study.
- A research question the results will answer.
- A theory that underpins the study.
- A statistical method for calculating the extent to which the results could have happened by chance.
Answer
A statement that the researcher wants to test through the data collected in a study.
5. What is the cyclical process of collecting and analysing data during a single research study called?
- Interim Analysis
- Inter analysis
- inter item analysis
- constant analysis
Answer
Interim Analysis
6. The process of quantifying data is referred to as ____
- Topology
- Digramming
- Enumeration
- coding
Answer
Enumeration
7. An advantage of using computer programs for qualitative data is that they _
- Can reduce time required to analyse data (i.e., after the data are transcribed)
- Help in storing and organising data
- Make many procedures available that are rarely done by hand due to time constraints
- All of the above
Answer
All of the Above
8. Boolean operators are words that are used to create logical combinations.
- True
- False
Answer
True
9. ______ are the basic building blocks of qualitative data.
- Categories
- Units
- Individuals
- None of the above
Answer
Categories
10. This is the process of transforming qualitative research data from written interviews or field notes into typed text.
- Segmenting
- Coding
- Transcription
- Mnemoning
Answer
Transcription
11. A challenge of qualitative data analysis is that it often includes data that are unwieldy and complex; it is a major challenge to make sense of the large pool of data.
- True
- False
Answer
True
12. Hypothesis testing and estimation are both types of descriptive statistics.
- True
- False
Answer
False
13. A set of data organised in a participants(rows)-by-variables(columns) format is known as a “data set.”
- True
- False
Answer
True
14. A graph that uses vertical bars to represent data is called a ___
- Line graph
- Bar graph
- Scatterplot
- Vertical graph
Answer
Bar graph
15. ____ are used when you want to visually examine the relationship between two quantitative variables.
- Bar graph
- pie graph
- line graph
- Scatterplot
Answer
Scatterplot
16. The denominator (bottom) of the z-score formula is
- The standard deviation
- The difference between a score and the mean
- The range
- The mean
Answer
The standard deviation
17. Which of these distributions is used for a testing hypothesis?
- Normal Distribution
- Chi-Squared Distribution
- Gamma Distribution
- Poisson Distribution
Answer
Chi-Squared Distribution
18. A statement made about a population for testing purpose is called?
- Statistic
- Hypothesis
- Level of Significance
- Test-Statistic
Answer
Hypothesis
19. If the assumed hypothesis is tested for rejection considering it to be true is called?
- Null Hypothesis
- Statistical Hypothesis
- Simple Hypothesis
- Composite Hypothesis
Answer
Null Hypothesis
20. If the null hypothesis is false then which of the following is accepted?
- Null Hypothesis
- Positive Hypothesis
- Negative Hypothesis
- Alternative Hypothesis.
Answer
Alternative Hypothesis.
21. Alternative Hypothesis is also called as?
- Composite hypothesis
- Research Hypothesis
- Simple Hypothesis
- Null Hypothesis
Answer
Research Hypothesis
- Address
- Contents
- Both a and b
- none
Show Answer
Both a and b
2. A collection of lines that connects several devices is called
- Bus
- Peripheral connection wires
- Both a and b
- internal wires
Show Answer
Bus
3. Conventional architectures coarsely comprise of a
- Processor
- Memory System
- Data path
- All of the above
Show Answer
All of the above
4. VLIW processors rely on
- Compile time analysis
- Initial time analysis
- Final time analysis
- id time analysis
Show Answer
Compile time analysis
5. HPC is not used in high span bridges
- True
- False
Show Answer
False
6. The access time of memory is …………… the time required for performing any single CPU operation.
- longer than
- shorter than
- negligible than
- same as
Show Answer
longer than
7. Data intensive applications utilize_
- High aggregate throughput
- High aggregate network bandwidth
- high processing and memory system performance
- none of above
Show Answer
High aggregate throughput
8. Memory system performance is largely captured by_
- Latency
- bandwidth
- both a and b
- none of above
Show Answer
both a and b
9. A processor performing fetch or decoding of different instruction during the execution of another instruction is called __ .
- Super-scaling
- Pipe-lining
- Parallel Computation
- none of above
Show Answer
Pipe-lining
10. For a given FINITE number of instructions to be executed, which architecture of the processor provides for a faster execution ?
- ISA
- ANSA
- Super-scalar
- All of the above
Show Answer
Super-scalar
11. HPC works out to be economical.
- True
- false
Show Answer
True
12. High Performance Computing of the Computer System tasks are done by
- Node Cluster
- Network Cluster
- Beowulf Cluster
- Stratified Cluster
Show Answer
Beowulf Cluster
13. Octa Core Processors are the processors of the computer system that contains
- 2 Processors
- 4 Processors
- 6 Processors
- 8 Processors
Show Answer
8 Processors
14. Parallel computing uses _ execution
- sequential
- unique
- simultaneous
- None of above
Show Answer
simultaneous
15. Which of the following is NOT a characteristic of parallel computing?
- Breaks a task into pieces
- Uses a single processor or computer
- Simultaneous execution
- May use networking
Show Answer
Uses a single processor or computer
16. Which of the following is true about parallel computing performance?
- Computations use multiple processors
- There is an increase in speed
- The increase in speed is loosely tied to the number of processor or computers used
- All of the answers are correct.
Show Answer
All of the answers are correct.
17. __ leads to concurrency.
- Serialization
- Parallelism
- Serial processing
- Distribution
Show Answer
Parallelism
18. MIPS stands for?
- Mandatory Instructions/sec
- Millions of Instructions/sec
- Most of Instructions/sec
- Many Instructions / sec
Show Answer
Millions of Instructions/sec
19. Which MIMD systems are best scalable with respect to the number of processors
- Distributed memory computers
- consume systems
- Symmetric multiprocessors
- None of above
Show Answer
Distributed memory computers
20. To which class of systems does the von Neumann computer belong?
- SIMD (Single Instruction Multiple Data)
- MIMD (Multiple Instruction Multiple Data)
- MISD (Multiple Instruction Single Data)
- SISD (Single Instruction Single Data)
Show Answer
SISD (Single Instruction Single Data)
21. Which of the architecture is power efficient?
- CISC
- RISC
- ISA
- IANA
Show Answer
RISC
22. Pipe-lining is a unique feature of _.
- RISC
- CISC
- ISA
- IANA
Show Answer
RISC
23. The computer architecture aimed at reducing the time of execution of instructions is __.
- RISC
- CISC
- ISA
- IANA
Show Answer
RISC
24. Type of microcomputer memory is
- processor memory
- primary memory
- secondary memory
- All of above
Show Answer
All of above
25. A pipeline is like_
- Overlaps various stages of instruction execution to achieve performance.
- House pipeline
- Both a and b
- A gas line
Show Answer
Overlaps various stages of instruction execution to achieve performance.
26. Scheduling of instructions is determined_
- True Data Dependency
- Resource Dependency
- Branch Dependency
- All of above
Show Answer
All of above
27. The fraction of data references satisfied by the cache is called_
- Cache hit ratio
- Cache fit ratio
- Cache best ratio
- none of above
Show Answer
Cache hit ratio
28. A single control unit that dispatches the same Instruction to various processors is__
- SIMD
- SPMD
- MIMD
- none of above
Show Answer
SIMD
29. The primary forms of data exchange between parallel tasks are_
- Accessing a shared data space
- Exchanging messages.
- Both A and B
- none of above
Show Answer
Both A and B
30. Switches map a fixed number of inputs to outputs.
- True
- False
Show Answer
True
The First step in developing a parallel algorithm is_
- To Decompose the problem into tasks that can be executed concurrently
- Execute directly
- Execute indirectly
- None of Above
Answer
To Decompose the problem into tasks that can be executed concurrently
2. The number of tasks into which a problem is decomposed determines its_
- Granularity
- Priority
- Modernity
- None of Above
Answer
Granularity
3. The length of the longest path in a task dependency graph is called_
- the critical path length
- the critical data length
- the critical bit length
- None of Above
Answer
he critical path length
4. The graph of tasks (nodes) and their interactions/data exchange (edges)_
- Is referred to as a task interaction graph
- Is referred to as a task Communication graph
- Is referred to as a task interface graph
- None of Above
Answer
Is referred to as a task interaction graph
5. Mappings are determined by_
- task dependency
- task interaction graphs
- Both A and B
- None of Above
Answer
Both A and B
6. Decomposition Techniques are_
- recursive decomposition
- data decomposition
- exploratory decomposition
- speculative decomposition
- All of above
Answer
All of above
7. The Owner Computes rule generally states that the process assigned a particular data item is responsible for _
- All computation associated with it
- Only one computation
- Only two computation
- Only occasionally computation
Answer
All computation associated with it
8. A simple application of exploratory decomposition is_
- The solution to a 15 puzzle
- The solution to 20 puzzle
- The solution to any puzzle
- None of Above
Answer
The solution to a 15 puzzle
9. Speculative Decomposition consist of _
- conservative approaches
- optimistic approaches
- Both A and B
- only B
Answer
Both A and B
hpc mcq questions
10. task characteristics include:
- Task generation.
- Task sizes.
- Size of data associated with tasks.
- All of above
Answer
All of above
11. What is a high performance multi-core processor that can be used to accelerate a wide variety of applications using parallel computing.
- CLU
- GPU
- CPU
- DSP
Answer
GPU
12. What is GPU?
- Grouped Processing Unit
- Graphics Processing Unit
- Graphical Performance Utility
- Graphical Portable Unit
Answer
Graphics Processing Unit
13. A code, known as GRID, which runs on GPU consisting of a set of
- 32 Thread
- 32 Block
- Unit Block
- Thread Block
Answer
Thread Block
14. Interprocessor communication that takes place
- Centralized memory
- Shared memory
- Message passing
- Both A and B
Answer
Both A and B
15. Decomposition into a large number of tasks results in coarse-grained decomposition
- True
- False
Answer
False
16. Relevant task characteristics include
- Task generation.
- Task sizes
- Size of data associated with tasks
- Overhead
- both A and B
Answer
both A and B
17. The fetch and execution cycles are interleaved with the help of __
- Modification in processor architecture
- Clock
- Special unit
- Control unit
Answer
Clock
18. The processor of system which can read /write GPU memory is known as
- kernal
- device
- Server
- Host
Answer
Host
19. Increasing the granularity of decomposition and utilizing the resulting concurrency to perform more tasks in parallel decreses performance.
- TRUE
- FALSE
Answer
FALSE
Parallel computing mcq with answers
20. If there is dependency between tasks it implies their is no need of interaction between them.
- TRUE
- FALSE
Answer
FALSE
21. Parallel quick sort is example of task parallel model
- TRUE
- FALSE
Answer
TRUE
22. True Data Dependency is
- The result of one operation is an input to the next.
- Two operations require the same resource.
Answer
The result of one operation is an input to the next.
23. What is Granularity ?
- The size of database
- The size of data item
- The size of record
- The size of file
Answer
The size of data item
24. In coarse-grained parallelism, a program is split into …………………… task and ……………………… Size
- Large tasks , Smaller Size
- Small Tasks , Larger Size
- Small Tasks , Smaller Size
- Equal task, Equal Size
Answer
Large tasks , Smaller Size
- Gather-scatter operations
- Gather operations
- Scatter operations
- Gather-scatter technique
Answer
Gather-scatter operations
2. In the gather operation, a single node collects a ———
- Unique message from each node
- Unique message from only one node
- Different message from each node
- None of Above
Answer
Unique message from each node
3. In the scatter operation, a single node sends a ————
- Unique message of size m to every other node
- Different message of size m to every other node
- Different message of different size m to every other node
- All of Above
Answer
Unique message of size m to every other node
4. Is All to all Bradcasting is same as All to all personalized communication?
- Yes
- No
Answer
No
5. Is scatter operation is same as Broadcast?
- Yes
- No
Answer
No
6. All-to-all personalized communication is also known as
- Total Exchange
- Personal Message
- Scatter
- Gather
Answer
Total Exchange
7. By which way, scatter operation is different than broadcast
- Message size
- Number of nodes
- Same
- None of above
Answer
Message size
8. The gather operation is exactly the _ of the scatter operation
- Inverse
- Reverse
- Multiple
- Same
Answer
Inverse
9. The gather operation is exactly the inverse of the_
- Scatter operation
- Broadcast operation
- Prefix Sum
- Reduction operation
Answer
Scatter operation
10. The dual of one-to-all broadcast is all-to-one reduction. True or False?
- TRUE
- FALSE
Answer
TRUE
11. A binary tree in which processors are (logically) at the leaves and internal nodes are routing nodes.
- TRUE
- FALSE
Answer
TRUE
12. Group communication operations are built using point-to-point messaging primitives
- TRUE
- FALSE
Answer
TRUE
13. Communicating a message of size m over an uncongested network takes time ts + tmw
- True
- False
Answer
True
14. Parallel programs: Which speedup could be achieved according to Amdahl´s law for infinite number of processors if 5% of a program is sequential and the remaining part is ideally parallel?
- Infinite speedup
- 5
- 20
- None of above
Answer
20
15. Shift register that performs a circular shift is called
- Invalid Counter
- Valid Counter
- Ring
- Undefined
Answer
Ring
16. 8 bit information can be stored in
- 2 Registers
- 4 Registers
- 6 Registers
- 8 Registers
Answer
8 Registers
17. The result of prefix expression * / b + – d a c d, where a = 3, b = 6, c = 1, d = 5 is
- 0
- 5
- 10
- 8
Answer
10
18. The height of a binary tree is the maximum number of edges in any root to leaf path. The maximum number of nodes in a binary tree of height h is?
- 2h – 1
- 2h – 1 – 1
- 2h + 1 – 1
- 2 * (h+1)
Answer
2h + 1 – 1
19. A hypercube has_
- 2^d nodes
- 2d nodes
- 2n Nodes
- N Nodes
Answer
2^d nodes
Parallel computing mcq with answers
20. The Prefix Sum Operation can be implemented using the_
- All-to-all broadcast kernel
- All-to-one broadcast kernel
- One-to-all broadcast Kernel
- Scatter Kernel
Answer
All-to-all broadcast kernel
21.In the scatter operation_
- Single node send a unique message of size m to every other node
- Single node send a same message of size m to every other node
- Single node send a unique message of size m to next node
- None of Above
Answer
Single node send a unique message of size m to every other node
22. In All-to-All Personalized Communication Each node has a distinct message of size m for every other node
- True
- False
Answer
True
23. A binary tree in which processors are (logically) at the leaves and internal nodes are
routing nodes.
- True
- False
Answer
True
24. In All-to-All Broadcast each processor is the source as well as destination.
- True
- False
Answer
True
1. The First step in developing a parallel algorithm is_
- To Decompose the problem into tasks that can be executed concurrently
- Execute directly
- Execute indirectly
- None of Above
Answer
To Decompose the problem into tasks that can be executed concurrently
2. The number of tasks into which a problem is decomposed determines its_
- Granularity
- Priority
- Modernity
- None of Above
Answer
Granularity
3. The length of the longest path in a task dependency graph is called_
- the critical path length
- the critical data length
- the critical bit length
- None of Above
Answer
he critical path length
4. The graph of tasks (nodes) and their interactions/data exchange (edges)_
- Is referred to as a task interaction graph
- Is referred to as a task Communication graph
- Is referred to as a task interface graph
- None of Above
Answer
Is referred to as a task interaction graph
5. Mappings are determined by_
- task dependency
- task interaction graphs
- Both A and B
- None of Above
Answer
Both A and B
6. Decomposition Techniques are_
- recursive decomposition
- data decomposition
- exploratory decomposition
- speculative decomposition
- All of above
Answer
All of above
7. The Owner Computes rule generally states that the process assigned a particular data item is responsible for _
- All computation associated with it
- Only one computation
- Only two computation
- Only occasionally computation
Answer
All computation associated with it
8. A simple application of exploratory decomposition is_
- The solution to a 15 puzzle
- The solution to 20 puzzle
- The solution to any puzzle
- None of Above
Answer
The solution to a 15 puzzle
9. Speculative Decomposition consist of _
- conservative approaches
- optimistic approaches
- Both A and B
- only B
Answer
Both A and B
hpc mcq questions
10. task characteristics include:
- Task generation.
- Task sizes.
- Size of data associated with tasks.
- All of above
Answer
All of above
11. What is a high performance multi-core processor that can be used to accelerate a wide variety of applications using parallel computing.
- CLU
- GPU
- CPU
- DSP
Answer
GPU
12. What is GPU?
- Grouped Processing Unit
- Graphics Processing Unit
- Graphical Performance Utility
- Graphical Portable Unit
Answer
13. A code, known as GRID, which runs on GPU consisting of a set of
- 32 Thread
- 32 Block
- Unit Block
- Thread Block
Answer
Thread Block
14. Interprocessor communication that takes place
- Centralized memory
- Shared memory
- Message passing
- Both A and B
Answer
Both A and B
15. Decomposition into a large number of tasks results in coarse-grained decomposition
- True
- False
Answer
False
16. Relevant task characteristics include
- Task generation.
- Task sizes
- Size of data associated with tasks
- Overhead
- both A and B
Answer
both A and B
17. The fetch and execution cycles are interleaved with the help of __
- Modification in processor architecture
- Clock
- Special unit
- Control unit
Answer
Clock
18. The processor of system which can read /write GPU memory is known as
- kernal
- device
- Server
- Host
Answer
Host
19. Increasing the granularity of decomposition and utilizing the resulting concurrency to perform more tasks in parallel decreses performance.
- TRUE
- FALSE
Answer
FALSE
Parallel computing mcq with answers
20. If there is dependency between tasks it implies their is no need of interaction between them.
- TRUE
- FALSE
Answer
FALSE
21. Parallel quick sort is example of task parallel model
- TRUE
- FALSE
Answer
TRUE
22. True Data Dependency is
- The result of one operation is an input to the next.
- Two operations require the same resource.
Answer
The result of one operation is an input to the next.
23. What is Granularity ?
- The size of database
- The size of data item
- The size of record
- The size of file
Answer
The size of data item
24. In coarse-grained parallelism, a program is split into …………………… task and ……………………… Size
- Large tasks , Smaller Size
- Small Tasks , Larger Size
- Small Tasks , Smaller Size
- Equal task, Equal Size
Answer
Large tasks , Smaller Size
1. The primary and essential mechanism to support the sparse matrices is
- Gather-scatter operations
- Gather operations
- Scatter operations
- Gather-scatter technique
Answer
Gather-scatter operations
2. In the gather operation, a single node collects a ———
- Unique message from each node
- Unique message from only one node
- Different message from each node
- None of Above
Answer
Unique message from each node
3. In the scatter operation, a single node sends a ————
- Unique message of size m to every other node
- Different message of size m to every other node
- Different message of different size m to every other node
- All of Above
Answer
Unique message of size m to every other node
4. Is All to all Bradcasting is same as All to all personalized communication?
- Yes
- No
Answer
No
5. Is scatter operation is same as Broadcast?
- Yes
- No
Answer
No
6. All-to-all personalized communication is also known as
- Total Exchange
- Personal Message
- Scatter
- Gather
Answer
Total Exchange
7. By which way, scatter operation is different than broadcast
- Message size
- Number of nodes
- Same
- None of above
Answer
Message size
8. The gather operation is exactly the _ of the scatter operation
- Inverse
- Reverse
- Multiple
- Same
Answer
Inverse
9. The gather operation is exactly the inverse of the_
- Scatter operation
- Broadcast operation
- Prefix Sum
- Reduction operation
Answer
Scatter operation
10. The dual of one-to-all broadcast is all-to-one reduction. True or False?
- TRUE
- FALSE
Answer
TRUE
11. A binary tree in which processors are (logically) at the leaves and internal nodes are routing nodes.
- TRUE
- FALSE
Answer
TRUE
12. Group communication operations are built using point-to-point messaging primitives
- TRUE
- FALSE
Answer
TRUE
13. Communicating a message of size m over an uncongested network takes time ts + tmw
- True
- False
Answer
True
14. Parallel programs: Which speedup could be achieved according to Amdahl´s law for infinite number of processors if 5% of a program is sequential and the remaining part is ideally parallel?
- Infinite speedup
- 5
- 20
- None of above
Answer
20
15. Shift register that performs a circular shift is called
- Invalid Counter
- Valid Counter
- Ring
- Undefined
Answer
Ring
16. 8 bit information can be stored in
- 2 Registers
- 4 Registers
- 6 Registers
- 8 Registers
Answer
8 Registers
17. The result of prefix expression * / b + – d a c d, where a = 3, b = 6, c = 1, d = 5 is
- 0
- 5
- 10
- 8
Answer
10
18. The height of a binary tree is the maximum number of edges in any root to leaf path. The maximum number of nodes in a binary tree of height h is?
- 2h – 1
- 2h – 1 – 1
- 2h + 1 – 1
- 2 * (h+1)
Answer
2h + 1 – 1
19. A hypercube has_
- 2^d nodes
- 2d nodes
- 2n Nodes
- N Nodes
Answer
2^d nodes
Parallel computing mcq with answers
20. The Prefix Sum Operation can be implemented using the_
- All-to-all broadcast kernel
- All-to-one broadcast kernel
- One-to-all broadcast Kernel
- Scatter Kernel
Answer
All-to-all broadcast kernel
21.In the scatter operation_
- Single node send a unique message of size m to every other node
- Single node send a same message of size m to every other node
- Single node send a unique message of size m to next node
- None of Above
Answer
Single node send a unique message of size m to every other node
22. In All-to-All Personalized Communication Each node has a distinct message of size m for every other node
- True
- False
Answer
True
23. A binary tree in which processors are (logically) at the leaves and internal nodes are
routing nodes.
- True
- False
Answer
True
24. In All-to-All Broadcast each processor is thesource as well as destination.
- True
- False
Answer
True
Unit 1
1. Data Analysis is defined by the statistician?
a. William S. b. Hans Peter Luhn c. Gregory Piatetsky-Shapiro d. John Tukey
Ans D
2. What is classification?
a) deciding what features to use in a pattern recognition problem b) deciding what class an input pattern belongs to c) deciding what type of neural network to use d) none of the mentioned
Ans. B
3. Data in ___________ bytes size is called Big Data.
A. Tera B. Giga C. Peta D. Meta
Ans : C
Explanation: data in Peta bytes i.e. 10^15 byte size is called Big Data. 4. How many V's of Big Data
A. 2 B. 3 C. 4 D. 5
Ans : D
Explanation: Big Data was defined by the “3Vs” but now there are “5Vs” of Big Data which are Volume, Velocity, Variety, Veracity, Value
5. Transaction data of the bank is?
A. structured data B. unstructured datat
C. Both A and B D. None of the above
Ans : A
Explanation: Data which can be saved in tables are structured data like the transaction data of the bank. 6. In how many forms BigData could be found?
A. 2 B. 3 C. 4 D. 5
Ans : B
Explanation: BigData could be found in three forms: Structured, Unstructured and Semi-structured. 7. Which of the following are Benefits of Big Data Processing?
A. Businesses can utilize outside intelligence while taking decisions B. Improved customer service C. Better operational efficiency D. All of the above
Ans : D
Explanation: All of the above are Benefits of Big Data Processing.
8. Which of the following are incorrect Big Data Technologies?
A. Apache Hadoop B. Apache Spark C. Apache Kafka D. Apache Pytarch
Ans : D
Explanation: Apache Pytarch is incorrect Big Data Technologies. 9. The overall percentage of the world’s total data has been created just within the past two years is ?
A. 80% B. 85%
C. 90% D. 95%
Ans : C
Explanation: The overall percentage of the world’s total data has been created just within the past two years is 90%. 10. Apache Kafka is an open-source platform that was created by?
A. LinkedIn B. Facebook C. Google D. IBM
Ans : A
Explanation: Apache Kafka is an open-source platform that was created by LinkedIn in the year 2011.
11. What was Hadoop named after?
A. Creator Doug Cutting’s favorite circus act B. Cuttings high school rock band C. The toy elephant of Cutting’s son D. A sound Cutting’s laptop made during Hadoop development
Ans : C
Explanation: Doug Cutting, Hadoop creator, named the framework after his child’s stuffed toy elephant. 12. What are the main components of Big Data?
A. MapReduce B. HDFS C. YARN D. All of the above
Ans : D
Explanation: All of the above are the main components of Big Data.
13. Point out the correct statement.
A. Hadoop do need specialized hardware to process the data B. Hadoop 2.0 allows live stream processing of real time data C. In Hadoop programming framework output files are divided into lines or records D. None of the above
Ans : B
Explanation: Hadoop batch processes data distributed over a number of computers ranging in 100s and 1000s. 14. Which of the following fields come under the umbrella of Big Data?
A. Black Box Data B. Power Grid Data C. Search Engine Data D. All of the above
Ans : D
Explanation: All options are the fields come under the umbrella of Big Data.
15. Which of the following is not an example of Social Media? 1. Twitter 2. Google 3. Instagram 4. Youtube
ANs: 2 (Google)
16. By 2025, the volume of digital data will increase to 1. TB 2. YB 3. ZB 4. EB Ans: 3 ZB
17. Data Analysis is a process of 1. inspecting data 2. cleaning data 3. transforming data 4. All of Above
Ans. 4 All of above
18. Which of the following is not a major data analysis approaches? 1. Data Mining 2. Predictive Intelligence 3. Business Intelligence
4. Text Analytics
Ans. 2 Predictive Intelligence
19. The Process of describing the data that is huge and complex to store and process is known as 1. Analytics 2. Data mining 3. Big data 4. Data warehouse
Ans. 3 Big data
20. In descriptive statistics, data from the entire population or a sample is summarized with ? 1. Integer descriptor 2. floating descriptor 3. numerical descriptor 4. decimal descriptor
Ans. 3 numerical descriptor
21. Data generated from online transactions is one of the example for volume of big data 1. TRUE 2. FALSE
TRUE
22. Velocity is the speed at which the data is processed 1. True 2. False
False
23. Value tells the trustworthiness of data in terms of quality and accuracy 1. TRUE 2. FALSE
False
24. Hortonworks was introduced by Cloudera and owned by Yahoo 1. True 2. False
False
25. ____ refers to the ability to turn your data useful for business 1. Velocity 2. variety 3. Value 4. Volume
Ans. 3 Value
26. Data Analysis is defined by the statistician? 1. William S. 2. Hans Peter Luhn 3. Gregory Piatetsky-Shapiro 4. John Tukey
Ans. 4 John Tukey
27. Files are divided into ____ sized Chunks. 1. Static 2. Dynamic 3. Fixed 4. Variable
Ans. 3 Fixed
28. _____ is an open source framework for storing data and running application on clusters of commodity hardware. 1. HDFS 2. Hadoop 3. MapReduce 4. Cloud
Ans. 2 Hadoop
29. ____ is factors considered before Adopting Big Data Technology 1. Validation 2. Verification 3. Data 4. Design
Ans. 1 Validation
30. Which among the following is not a Data mining and analytical applications? 1. profile matching
2. social network analysis 3. facial recognition 4. Filtering
Ans. 4 Filtering
31. Which storage subsystem can support massive data volumes of increasing size. 1. Extensibility 2. Fault tolerance 3. Scalability 4. High-speed I/O capacity
Ans. 3 Scalability
32. ______ is a programming model for writing applications that can process Big Data in parallel on multiple nodes. 1. HDFS 2. MAP REDUCE 3. HADOOP 4. HIVE Ans. MAP REDUCE
33. How many main statistical methodologies are used in data analysis?
A. 2 B. 3 C. 4 D. 5
Ans : A
Explanation: In data analysis, two main statistical methodologies are used Descriptive statistics and Inferential statistics.
34. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.
A. TRUE B. FALSE C. Can be true or false D. Can not say
Ans : A
Explanation: The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.
35. The branch of statistics which deals with development of particular statistical methods is classified as 1. industry statistics 2. economic statistics 3. applied statistics 4. applied statistics
Ans. applied statistics
36. Point out the correct statement. a) Descriptive analysis is first kind of data analysis performed b) Descriptions can be generalized without statistical modelling c) Description and Interpretation are same in descriptive analysis d) None of the mentioned
Answer: b Explanation: Descriptive analysis describe a set of data.
37. What are the five V’s of Big Data?
A. Volume
B. Velocity
C. Variety
D. All the above
Answer: Option D
38. What are the main components of Big Data?
A. MapReduce
B. HDFS
C. YARN
D. All of these
Answer: Option D
39. What are the different features of Big Data Analytics?
A. Open-Source
B. Scalability
C. Data Recovery
D. All the above
Answer: Option D
40. Which of the following refers to the problem of finding abstracted patterns (or structures) in the unlabeled data?
A. Supervised learning
B. Unsupervised learning
C. Hybrid learning
D. Reinforcement learning
Answer: B
Explanation: Unsupervised learning is a type of machine learning algorithm that is generally used to find the hidden structured and patterns in the given unlabeled data.
41. Which one of the following refers to querying the unstructured textual data?
A. Information access
B. Information update
C. Information retrieval
D. Information manipulation
Answer: D
Explanation: Information retrieval refers to querying the unstructured textual data. We can also understand information retrieval as an activity (or process) in which the tasks of obtaining information from system recourses that are relevant to the information required from the huge source of information.
42. For what purpose, the analysis tools pre-compute the summaries of the huge amount of data?
A. In order to maintain consistency
B. For authentication
C. For data access
D. To obtain the queries response
Answer: d
Explanation: Whenever a query is fired, the response of the query would be put very earlier. So, for the query response, the analysis tools pre-compute the summaries of the huge amount of data. To understand it in more details, consider the following example:
43. Which one of the following statements is not correct about the data cleaning?
It refers to the process of data cleaning
It refers to the transformation of wrong data into correct data
It refers to correcting inconsistent data
All of the above
Answer: d
Explanation: Data cleaning is a kind of process that is applied to data set to remove the noise from the data (or noisy data), inconsistent data from the given data. It also involves the process of transformation where wrong data is transformed into the correct data as well. In other words, we can also say that data cleaning is a kind of pre-process in which the given set of data is prepared for the data warehouse.
44. Any data with unknown form or the structure is classified as _ data. a. Structured b. Unstructured c. Semi-structured d. None of above Ans. b
45.____ means relating to the issuing of reports. a. Analysis b. Reporting c. Reporting and Analysis d. None of the above
Ans. b
46.Veracity involves the reliability of the data this is ________due to the numerous data sources of big data a) Easy and difficulty b) Easiness c) Demanding d) none of these
Ans. c 47. ____is a process of defining the measurement of a phenomenon that is not directly measurable, though its existence is implied by other phenomena. a. Data preparation b. Model planning c. Communicating results d. Operationalization
Ans. d
48. _____data is data whose elements are addressable for effective analysis.
a. Structured b. Semi-structured c. Unstructured d. None of the above
Ans. a
49. ______data is information that does not reside in a relational database but that have some organizational properties that make it easier to analyze.
a. Structured b. Semi-structured c. Unstructured d. None of the above
Ans. b
50. ______data is a data which is not organized in a predefined manner or does not have a predefined data model, thus it is not a good fit for a mainstream relational database.
a. Structured b. Semi-structured c. Unstructured d. None of the above
Ans. c
51. There are ___ types of big data.
a. 2
b. 3 c. 4 d. 5
Ans. b
52. Google search is an example of _________ data.
a. Structured b. Semi-structured c. Unstructured d. None of the above
Ans. c
UNIT 2
1. Sentiment Analysis is an example of 1. Regression 2. Classification 3. clustering 4. Reinforcement Learning
1. 1, 2 and 4 2. 1, 2 and 3 3. 1 and 3 4. 1 and 2
Show Answer Ans. 1, 2 and 4
2. The self-organizing maps can also be considered as the instance of _________ type of learning.
A. Supervised learning B. Unsupervised learning C. Missing data imputation D. Both A & C
Answer: B Explanation: The Self Organizing Map (SOM), or the Self Organizing Feature Map is a kind of Artificial Neural Network which is trained through unsupervised learning.
3. The following given statement can be considered as the examples of_________
Suppose one wants to predict the number of newborns according to the size of storks' population by performing supervised learning
A. Structural equation modeling B. Clustering C. Regression D. Classification
Answer: C
Explanation: The above-given statement can be considered as an example of regression. Therefore the correct answer is C.
4. In the example predicting the number of newborns, the final number of total newborns can be considered as the _________
A. Features B. Observation C. Attribute D. Outcome a. Answer: d
b. Explanation: In the example of predicting the total number of newborns, the result will be represented as the outcome. Therefore, the total number of newborns will be found in the outcome or addressed by the outcome.
5. Which of the following statement is true about the classification?
A. It is a measure of accuracy B. It is a subdivision of a set C. It is the task of assigning a classification D. None of the above
Answer: B
Explanation: The term "classification" refers to the classification of the given data into certain sub-classes or groups according to their similarities or on the basis of the specific given set of rules.
6. Which one of the following correctly refers to the task of the classification?
A. A measure of the accuracy, of the classification of a concept that is given by a certain theory B. The task of assigning a classification to a set of examples C. A subdivision of a set of examples into a number of classes D. None of the above
Answer: B
Explanation: The task of classification refers to dividing the set into subsets or in the numbers of the classes. Therefore the correct answer is C.
7. _____is an observation which contains either very low value or very high value in comparison to other observed values. It may hamper the result, so it should be avoided. a. Dependent Variable b. Independent Variable c. Outlier Variable d. None of the above Ans. c
8. _______is a type of regression which models the non-linear dataset using a linear model.
a. Polynomial Regression b. Logistic Regression c. Linear Regression d. Decision Tree Regression
Ans. a
9. The prediction of the weight of a person when his height is known, is a simple example of regression. The function used in R language is_____.
a. Im() b. print() c. predict() d. summary( )
Ans. c
10. There is the following syntax of lm() function in multiple regression.
lm(y ~ x1+x2+x3...., data) a. y is predictor and x1,x2,x3 are the dependent variables. b. y is dependent and x1,x2,x3 are the predictors. c. data is predictor variable. d. None of the above.
Ans. b
11. _______is a probabilistic graphical model which represents a set of variables and their conditional dependencies using a directed acyclic graph.
a. A Bayesian network b. Bayes Network c. Bayesian Model
d. All of the above
Ans. d
12. In support vector regression, _____is a function used to map lower dimensional data into higher dimensional data
A) Boundary line B) Kernel C) Hyper Plane D) Support Vector Ans. B
13. If the independent variables are highly correlated with each other than other variables then such condition is called___________ a) outlier b) Multicollinearity c) under fitting d) independent variable
Ans. b
14. The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a ____ or_____.
a. Directed Acyclic Graph or DAG b. Directed Cyclic Graph or DCG. c. Both the above. d. None of the above.
Ans. a
15. The hyperplane with maximum margin is called the ______ hyperplane. a. Non-optimal b. Optimal c. None of the above d. Requires one more option
Ans. b
16. One more _____ is needed for non-linear SVM.
a. Dimension b. Attribute c. Both the above d. None of the above
Ans. a
17. A subset of dataset to train the machine learning model, and we already know the output.
a. Training set b. Test set c. Both the above d. None of the above
Ans. a
18. ______is the final step of data preprocessing in machine learning. It is a technique to standardize the independent variables of the dataset in a specific range. In_____, we put our variables in the same range and in the same scale so that no any variable dominate the other variable
a. Feature Sampling b. Feature Scaling c. None of the above d. Both the above
Ans. b
19. Principal components analysis (PCA) is a statistical technique that allows identifying underlying linear patterns in a data set so it can be expressed in terms of other data set of a significantly ____ dimension without much loss of information. a. Lower b. Higher c. Equal d. None of the above
Ans. a
20. _____ units which are internal to the network and do not directly interact with the environment. a. Input
b. Output c. Hidden d. None of the above
Ans. c
21. In a ____ network there is an ordering imposed on the nodes in a network: if there is a connection from unit a to unit b then there can-not be a connection from b to a. a. Feedback b. Feed-Forward c. None of the above
Ans. b
22. _____ contains the multiple logical values and these values are the truth values of a variable or problem between 0 and 1. This concept was introduced by Lofti Zadeh in 1965 a. Boolean Logic b. Fuzzy Logic c. None of the above
Ans. b
23. ______is a module or component, which takes the fuzzy set inputs generated by the Inference Engine, and then transforms them into a crisp value. a. Fuzzification b. Defuzzification c. Inference Engine d. None of the above
Ans. b
24. The most common application of time series analysis is forecasting future values of a numeric value using the ______ structure of the ____ a. Shares,data b. Temporal,data c. Permanent,data d. None of these
Ans. b
25. Identify the component of a time series
a. Temporal b. Shares c. Trend d. Policymakers
Ans. c
26. Predictable pattern that recurs or repeats over regular intervals. Seasonality is often observed within a year or less: This define the term__________ a. Trend b. Seasonality c. Cycles d. Recession
Ans. b
27. ________Learning uses a training set that consists of a set of pattern pairs: an input pattern and the corresponding desired (or target) output pattern. The desired output may be regarded as the ‘network’s ‘teacher” for that input a. Unsupervised b. Supervised c. Modular d. Object
Ans. b
28. The _______ perceptron consists of a set of input units connected by a single layer of weights to a set of output units a. Multi layer b. Single layer c. Hidden layer d. None of these
Ans. b
29. If we add another layer of weights to single layer perceptron , then we find that there is a new set of units that are neither input or output units; for simplicity we consider more than 2 layers has a. Single layer perceptron b. Multi layer perceptron c. Hidden layer d. None of these
Ans. b
30. Patterns that repeat over a certain period of time a. Seasonal b. Trend c. None of the above d. Both of the above
Ans. a
31. Which of the following is characteristic of best machine learning method ?
a. Fast b. Accuracy c. Scalable d. All of the Mentioned
Ans. d
32. Supervised learning differs from unsupervised clustering in that supervised learning requires a. at least one input attribute. b. input attributes to be categorical. c. at least one output attribute. d. ouput attriubutes to be categorical. Ans. d
33. Supervised learning and unsupervised clustering both require at least one a. hidden attribute. b. output attribute. c. input attribute. d. categorical attribute. Ans. c
34. Which statement is true about prediction problems? a. The output attribute must be categorical. b. The output attribute must be numeric. c. The resultant model is designed to determine future outcomes. d. The resultant model is designed to classify current behavior. Ans. c
35. Which statement is true about neural network and linear regression models? a. Both models require input attributes to be numeric. b. Both models require numeric attributes to range between 0 and 1. c. The output of both models is a categorical attribute value. d. Both techniques build models whose output is determined by a linear sum of weighted input attribute values. Ans. a
36. A feed-forward neural network is said to be fully connected when a. all nodes are connected to each other. b. all nodes at the same layer are connected to each other. c. all nodes at one layer are connected to all nodes in the next higher layer. d. all hidden layer nodes are connected to all output layer nodes. Ans. c
37. Machine learning techniques differ from statistical techniques in that machine learning methods a. typically assume an underlying distribution for the data. b. are better able to deal with missing and noisy data. c. are not able to explain their behavior. d. have trouble with large-sized datasets. Ans. b
38. This supervised learning technique can process both numeric and categorical input attributes. a. linear regression b. Bayes classifier c. logistic regression d. backpropagation learning Ans. b
39. This technique associates a conditional probability value with each data instance. a. linear regression b. logistic regression c. simple regression
d. multiple linear regression Ans. b
40. Logistic regression is a ________ regression technique that is used to model data having a _____outcome. a. linear, numeric b. linear, binary c. nonlinear, numeric d. nonlinear, binary Ans. d
41. Which of the following problems is best solved using time-series analysis? a. Predict whether someone is a likely candidate for having a stroke. b. Determine if an individual should be given an unsecured loan. c. Develop a profile of a star athlete. d. Determine the likelihood that someone will terminate their cell phone contract.
Ans. d
42. Which of the following is true about Naive Bayes? a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent c. Both A and B d. None of the above options Ans. c 43.Simple regression assumes a __________ relationshipbetween the input attribute and output attribute. a. linear b. quadratic c. reciprocal d. inverse
44.With Bayes classifier, missing data items are a. treated as equal compares. b. treated as unequal compares.
c. replaced with a default value. d. ignored. 45. What is Machine learning? a. The autonomous acquisition of knowledge through the use of computer programs b. The autonomous acquisition of knowledge through the use of manual programs c. The selective acquisition of knowledge through the use of computer programs d. The selective acquisition of knowledge through the use of manual programs
Ans: a
46. Automated vehicle is an example of ______ a. Supervised learning b. Unsupervised learning c. Active learning d. Reinforcement learning
Ans: a
47. Multilayer perceptron network is a. Usually, the weights are initially set to small random values b. A hard-limiting activation function is often used c. The weights can only be updated after all the training vectors have been presented d. Multiple layers of neurons allow for less complex decision boundaries than a single layer
Ans: a
48. Neural networks a. optimize a convex cost function b. cannot be used for regression as well as classification c. always output values between 0 and 1 d. can be used in an ensemble
Ans: d
49. In neural networks, nonlinear activation functions such as sigmoid, tanh, and ReLU a. speed up the gradient calculation in backpropagation, as compared to linear units b. are applied only to the output units c. help to learn nonlinear decision boundaries d. always output values between 0 and 1
Ans: c
50. Which of the following is a disadvantage of decision trees?
a. Factor analysis b. Decision trees are robust to outliers c. Decision trees are prone to be overfit d. None of the above
Ans: c
51. Back propagation is a learning technique that adjusts weights in the neural network by propagating weight changes. a. Forward from source to sink b. Backward from sink to source c. Forward from source to hidden nodes d. Backward from sink to hidden nodes
Ans: b
52. Identify the following activation function :
φ(V) = Z + (1/ 1 + exp (– x * V + Y) ),Z, X, Y are parameters
a. Step function b. Ramp function c. Sigmoid function d. Gaussian function
Ans: c
53. An artificial neuron receives n inputs x1, x2, x3............xnwith weights w1, w2, ..........wn attached to the input links. The weighted sum_________________ is computed to be passed on to a non-linear filter Φ called activation function to release the output. a. Σ wi b. Σ xi c. Σ wi + Σ xi d. Σ wi* xi
Ans: d
54. With Bayes classifier, missing data items are a. treated as equal compares. b. treated as unequal compares. c. replaced with a default value. d. ignored.
Ans:b
55. Machine learning techniques differ from statistical techniques in that machine learning methods a. typically assume an underlying distribution for the data. b. are better able to deal with missing and noisy data. c. are not able to explain their behavior. d. have trouble with large-sized datasets.
Ans: b
56. Which of the following is true about Naive Bayes?
a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent c. Both a and b d. None of the above options
Ans: c
57. How many terms are required for building a Bayes model? a. 1 b. 2 c. 3 d. 4
Ans: c
58. What does the Bayesian network provides? a. Complete description of the domain b. Partial description of the domain c. Complete description of the problem d. None of the mentioned
Ans: a
59. How the Bayesian network can be used to answer any query? a. Full distribution b. Joint distribution c. Partial distribution d. All of the mentioned
Ans: b
60. In which of the following learning the teacher returns reward and punishment to learner? a. Active learning b. Reinforcement learning c. Supervised learning d. Unsupervised learning
Ans: b
61. Which of the following is the model used for learning? a. Decision trees b. Neural networks c. Propositional and FOL rules d. All of the mentioned
Ans: d
UNIT - 3
1. Which of the following can be considered as the correct process of Data Mining? a. Infrastructure, Exploration, Analysis, Interpretation, Exploitation b. Exploration, Infrastructure, Analysis, Interpretation, Exploitation c. Exploration, Infrastructure, Interpretation, Analysis, Exploitation d. Exploration, Infrastructure, Analysis, Exploitation, Interpretation
Answer: a
Explanation: The process of data mining contains many sub-processes in a specific order. The correct order in which all sub-processes of data mining executes is Infrastructure, Exploration, Analysis, Interpretation, and Exploitation.
2. Which of the following is an essential process in which the intelligent methods are applied to extract data patterns? a. Warehousing b. Data Mining c. Text Mining d. Data Selection
Answer: b
Explanation: Data mining is a type of process in which several intelligent methods are used to extract meaningful data from the huge collection (or set) of data.
3. What are the functions of Data Mining? a. Association and correctional analysis classification b. Prediction and characterization c. Cluster analysis and Evolution analysis d. All of the above
Answer: d
Explanation: In data mining, there are several functionalities used for performing the different types of tasks. The common functionalities used in data mining are cluster analysis, prediction, characterization, and evolution. Still, the association and correctional analysis classification are also one of the important functionalities of data mining.
4. Which attribute is _not_ indicative for data streaming?
a. Limited amount of memory b. Limited amount of processing time c. Limited amount of input data d. Limited amount of processing power
Ans. c
5. Which of the following statements about data streaming is true?
a. Stream data is always unstructured data. b. Stream data often has a high velocity. c. Stream elements cannot be stored on disk. d. Stream data is always structured data.
Ans. b
6. Which of the following statements about sampling are correct? a. Sampling reduces the amount of data fed to a subsequent data mining algorithm b. Sampling reduces the diversity of the data stream c. Sampling increases the amount of data fed to a data mining algorithm d. Sampling algorithms often need multiple passes over the data
Ans. a
7. Which of the following statements about sampling are correct? a. Sampling reduces the diversity of the data stream b. Sampling increases the amount of data fed to a data mining algorithm c. Sampling algorithms often need multiple passes over the data d. Sampling aims to keep statistical properties of the data intact
Ans. d
8. What is the main difference between standard reservoir sampling and min-wise sampling?
a. Reservoir sampling makes use of randomly generated numbers whereas min-wise sampling does not. b. Min-wise sampling makes use of randomly generated numbers whereas reservoir sampling does not. c. Reservoir sampling requires a stream to be processed sequentially, whereas min-wise does not. d. For larger streams, reservoir sampling creates more accurate samples than min-wise sampling.
Ans. c
9. A Bloom filter guarantees no
a. false positives b. false negatives
c. false positives and false negatives d. false positives or false negatives, depending on the Bloom filter type
Ans. b
10. Which of the following statements about standard Bloom filters is correct?
a. It is possible to delete an element from a Bloom filter. b. A Bloom filter always returns the correct result. c. It is possible to alter the hash functions of a full Bloom filter to create more space. d. A Bloom filter always returns TRUE when testing for a previously added element.
Ans. d
11. The FM-sketch algorithm uses the number of zeros the binary hash value ends in to make an estimation. Which of the following statements is true about the hash tail?
a. Any specific bit pattern is equally suitable to be used as hash tail. b. Only bit patterns with more 0's than 1's are equally suitable to be used as hash tails. c. Only the bit patterns 0000000..00 (list of 0s) or 111111..11 (list of 1s) are suitable hash tails. d. Only the bit pattern 0000000..00 (list of 0s) is a suitable hash tail.
Ans. a
12. The FM-sketch algorithm can be used to:
a. Estimate the number of distinct elements. b. Sample data with a time-sensitive window. c. Estimate the frequent elements. d. Determine whether an element has already occurred in previous stream data.
Ans. a
13. The DGIM algorithm was developed to estimate the counts of 1's occur within the last kk bits of a stream window NN. Which of the following statements is true about the estimate of the number of 0's based on DGIM?
a. The number of 0's cannot be estimated at all. b. The number of 0's can be estimated with a maximum guaranteed error. c. To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's. d. None of above
Ans. b
14. Which of the following statements about the standard DGIM algorithm are false? a. DGIM operates on a time-based window
b. DGIM reduces memory consumption through a clever way of storing counts c. In DGIM, the size of a bucket is always a power of two d. The maximum number of buckets has to be chosen beforehand. Ans. d
15. Which of the following statements about the standard DGIM algorithm are false? a. DGIM operates on a time-based window b. The buckets contain the count of 1's and each 1's specific position in the stream c. DGIM reduces memory consumption through a clever way of storing counts d. In DGIM, the size of a bucket is always a power of two Ans. b
16. What are DGIM’s maximum error boundaries?
a. DGIM always underestimates the true count; at most by 25% b. DGIM either underestimates or overestimates the true count; at most by 50% c. DGIM always overestimates the count; at most by 50% d. DGIM either underestimates or overestimates the true count; at most by 25%
Ans. b
17. Which algorithm should be used to approximate the number of distinct elements in a data stream?
a. Misra-Gries b. Alon-Matias-Szegedy c. DGIM d. None of the above
Ans. d
18. Which of the following statements about Bloom filters are correct?
a. A Bloom filter has the same properties as a standard Hashmap data structure in Java (java.util.HashMap). b. A Bloom filter is full if no more hash functions can be added to it. c. A Bloom filter always returns FALSE when testing for an element that was not previously added d. A Bloom filter always returns TRUE when testing for a previously added element
Ans. d
19. Which of the following statements about Bloom filters are correct?
a. An empty Bloom filter (no elements added to it) will always return FALSE when testing for an element b. A Bloom filter has the same properties as a standard Hashmap data structure in Java (java.util.HashMap). c. A Bloom filter is full if no more hash functions can be added to it. d. A Bloom filter always returns FALSE when testing for an element that was not previously added Ans. a
20. Which of the following streaming windows show valid bucket representations according to the DGIM rules?
a. 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 1 b. 1 0 1 1 1 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 1 c. 1 1 1 1 0 0 1 1 1 0 1 0 1 d. 1 0 1 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1
Ans. d
21. For which of the following streams is the second-order moment is greater than 45?
a. 10 5 5 10 10 10 1 1 1 10 b. 1 1 1 1 1 5 10 10 5 1 c. 10 10 10 10 10 5 5 5 5 5 d. None of above Ans. c
22. For which of the following streams is the second-order moment is greater than 50?
a. 10 5 5 10 10 10 1 1 1 10 b. 10 10 10 10 10 10 10 10 10 10 c. 10 10 10 10 10 5 5 5 5 5 d. None of above
Ans. b
23. Which of the following statements is correct about data mining?
a. It can be referred to as the procedure of mining knowledge from data b. Data mining can be defined as the procedure of extracting information from a set of the data c. The procedure of data mining also involves several other processes like data cleaning, data transformation, and data integration d. All of the above
Answer: d
Explanation: The term data mining can be defined as the process of extracting information from the massive collection of data. In other words, we can also say that data mining is the procedure of mining useful knowledge from a huge set of data.
24. The classification of the data mining system involves:
a. Database technology b. Information Science c. Machine learning d. All of the above
Answer: d
Explanation: Generally, the classification of a data mining system depends on the following criteria: Database technology, machine learning, visualization, information science, and several other disciplines.
25. The issues like efficiency, scalability of data mining algorithms comes under_______
a. Performance issues b. Diverse data type issues
c. Mining methodology and user interaction d. All of the above
Answer: a
Explanation: In order to extract information effectively from a huge collection of data in databases, the data mining algorithm must be efficient and scalable. Therefore the correct answer is A.
26. In data streams, data is……..
a. continuous
b. discrete
c. scattered
d. none of above
Ans. a 27. In mining data stream data should be of…..
a. same type
b. different type
c. binary
d. none of above
Ans. b
28. Which one is not component of data stream management system?
a. Data stream
b. system processor
c. SQL engine
d. storage
Ans. c
29. Which of the following statement is true about mining data streams
a. Data rate is not controlled by the system
b. Data type is same for all data streams
c. Data is divided in chunks and later stored in database
d. none of above
Ans. a
30. Which of the following is the data stream source?
a. Sensors data
b. Web/traffic camera data
c. Image data
d. all of above
Ans. d
31. What are the different operations on stream?
a. Sampling
b. counting distinct elements
c. Filtering
d. All of above
Ans. d
32. Which one is not the data stream process?
a. Finding frequent item
b. sampling
c. Filtering
d. Counting distinct elements.
Ans. a
33. In Flajolet-Martin algorithm if the stream contains n elements with m of them unique, this algorithm runs in
a. O(n) time
b. constant time
c. O(2n) time
d. O(3n)time
Ans. a
34. which algorithm we will implement to know how many distinct users visited the website till now or in last 2 hours.
a. SVM
b. DGIM
c. FM
d. Clustering
Ans. c
35. In FM algorithm we shall use estimate...............for the number of distinct elements seen in the stream.
a. 2R
b. 3R
c. 2R
d None of above
Ans. a
36. In sliding window of size w an element arriving at time t expires at
a. w
b. t
c. t + w
d. t - w
Ans. c
37. Real-time data stream is _______
a. sequence of data items that arrive in some order and may be seen only once.
b. sequence of data items that arrive in some order and may be seen twice.
c. sequence of data items that arrive in same order
d. sequence of data items that arrive in different order
ans. a
38. Which of the following statements about standard Bloom filters is correct?
a. It is possible to delete an element from a Bloom filter.
b. A Bloom filter always returns the correct result.
C. It is possible to alter the hash functions of a full Bloom filter to create more space.
d. A Bloom filter always returns TRUE when testing for a previously added element.
Ans. d
39. What are DGIM’s maximum error boundaries?
a. DGIM always underestimates the true count; at most by 25%
b. DGIM either underestimates or overestimates the true count; at most by 50%
c. DGIM always overestimates the count; at most by 50%
d. DGIM either underestimates or overestimates the true count; at most by 25%
Ans. b
40. Which of the following statements about the standard DGIM algorithm are false?
a. DGIM operates on a time-based window.
b. DGIM reduces memory consumption through a clever way of storing counts.
c. In DGIM, the size of a bucket is always a power of two
d. The maximum number of buckets has to be chosen beforehand.
Ans. d
41. In DGIM,whenever forming a bucket then_____
a. Every bucket should have at least one 1, else no bucket can be formed
b. Every bucket should have at least two 1, else no bucket can be formed
c. Every bucket should have at least three 1, else no bucket can be formed
d. Every bucket should have at least four 1, else no bucket can be formed
Ans. a
42. Which attribute is not indicative for data streaming?
a. Limited amount of memory
b. Limited amount of processing time
c. Limited amount of input data
d. Limited amount of processing power
Ans. c
43. In Filtering Streams____________
a. Accept those tuples in the stream that meet a criterion.
b. Accept data in the stream that meet a criterion.
c. Accept those class in the stream that meet a criterion
d. Accept rows in the stream that meet a criterion.
Ans. a
44. A Bloom filter consists of_________
a. An array of n bits, initially all 0’s.
b. An array of 1 bits, initially all 0’s.
c. An array of 2 bits, initially all 0’s.
d. An array of n bits, initially all 1’s.
Ans. a
45. The purpose of the Bloom filter is to allow____________
a. through all stream elements whose keys are in Set
b. through all stream elements whose keys are in class
c. through all data elements whose keys are in Set
d. through all touple elements whose keys are in Set
Ans. a
46. The second order moment for the stream a, b, c, b, d, a, c, d, a, b, d, c, a, a, b is
a. 60
b. 59
c. 51
d. 71
Ans. b
47. The second order moment for the stream a, b, c, b, d, a, c, d, a, b, d, c, a, a, b using Alon-Matias-Szegedy Algorithm is
a. 59
b. 67
c. 55
d. 75
Ans. c
48. Which of the following stream clustering algorithm can be used for counting 1's in a stream
a. FM Algorithm
b. PCY Algorithm
c. BDMO Algorithm
d. SON Algorithm
Ans. c
49. The time between elements of one stream
a. need not be uniform
b. need to be uniform
c. must be 1ms.
d. must be 1ns
Ans. a
50. In Bloom filter an array of n bits is initialized with
a. all 0s
b. all 1s
c. half 0s and half 1s
d. all -1
Ans. a
51. the number of different elements appearing in a stream, using Flajolet Martin algorithm. The Given Stream is: 4, 2, 5, 9, 1, 6, 3, 7, are--------- . Hash function is h(x) = 3x + 1 mod 32
a. 12
b. 16
c. 8
d. 9
Ans. b
52. The number of different elements appearing in a stream, using Flajolet Martin algorithm. The Given Stream is: 4, 2, 5, 9, 1, 6, 3, 7, are--------- . Hash function is h(x) = x + 6 mod 32.
a. 8
b. 16
c. 12
d. 20
Ans. a
UNIT -4
1. Movie Recommendation systems are an example of 1. Classification 2. Clustering 3. Reinforcement Learning 4. Regression 1. 2 only 2. 1 and 3 3. 1 and 2 4. 2 and 3 Show Answer 1 and 3
2. In the following given diagram, which type of clustering is used?
a. Hierarchal b. Naive Bayes c. Partitional d. None of the above e. Answer: a
f. Explanation: In the above-given diagram, the hierarchal type of clustering is used. The hierarchal type of clustering categorizes data through a variety of scales by making a cluster tree. So the correct answer is A.
3. Which of the following statements is incorrect about the hierarchal clustering?
a. The hierarchal type of clustering is also known as the HCA b. The choice of an appropriate metric can influence the shape of the cluster c. In general, the splits and merges both are determined in a greedy manner d. All of the above
Answer: a
Explanation: All following statements given in the above question are incorrect, so the correct answer is D.
4. Which one of the following can be considered as the final output of the hierarchal type of clustering?
a. A tree which displays how the close thing are to each other b. Assignment of each point to clusters c. Finalize estimation of cluster centroids d. None of the above e. Answer: a
f. Explanation: The hierarchal type of clustering can be referred to as the agglomerative approach.
5. Which one of the following statements about the K-means clustering is incorrect?
a. The goal of the k-means clustering is to partition (n) observation into (k) clusters b. K-means clustering can be defined as the method of quantization c. The nearest neighbor is the same as the K-means
d. All of the above e. Answer: c
f. Explanation: There is nothing to deal in between the k-means and the K- means the nearest neighbor.
6. Which of the following statements about hierarchal clustering is incorrect?
a. The hierarchal clustering can primarily be used for the aim of exploration b. The hierarchal clustering should not be primarily used for the aim of exploration c. Both A and B d. None of the above e. Answer: a
f. Explanation: The hierarchical clustering technique can be used for exploration because it is the deterministic technique of clustering.
7. Which one of the clustering technique needs the merging approach?
a. Partitioned b. Naïve Bayes
Hierarchical
c. Both A and C d. Answer: c
e. Explanation: The hierarchal type of clustering is one of the most commonly used methods to analyze social network data. In this type of clustering method, multiple nodes are compared with each other on the basis of their similarities and several larger groups' are formed by merging the nodes or groups of nodes that have similar characteristics.
8. Which one of the following correctly defines the term cluster?
a. Group of similar objects that differ significantly from other objects b. Symbolic representation of facts or ideas from which information can potentially be extracted c. Operations on a database to transform or simplify data in order to prepare it for a machine-learning algorithm d. All of the above e. Answer: a
f. Explanation: The term "cluster" refers to the set of similar objects or items that differ significantly from the other available objects. In other words, we can understand clusters as making groups of objects that contain similar characteristics form all available objects. Therefore the correct answer is A.
9. Hierarchical clustering should be mainly used for exploration.
a. True
b. False
c. May be true or false
d. None of the above
Answer: a
10. K-means clustering consists of a number of iterations and not deterministic.
a. True
b. False
c. May be true or false
d. None of the above
Answer: a
11. Which function is used for k-means clustering?
(A). k-means
(B). k-mean
(C). heatmap
(D). none of the mentioned
MCQ Answer: a
12. Which is needed by K-means clustering?
(A). defined distance metric
(B). number of clusters
(C). initial guess as to cluster centroids
(D). all of these
MCQ Answer: d
13. Which is conclusively produced by Hierarchical Clustering?
(A). final estimation of cluster centroids
(B). tree showing how nearby things are to each other
(C). assignment of each point to clusters
(D). all of these
MCQ Answer: b
14. Which clustering technique requires a merging approach?
(A). Partitional
(B). Hierarchical
(C). Naive Bayes
(D). None of the mentioned
MCQ Answer: b
15. Which of the following is finally produced by Hierarchical Clustering?
a) final estimate of cluster centroids
b) tree showing how close things are to each other
c) assignment of each point to clusters
d) all of the mentioned
Ans. b
16. Which of the following is required by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of the mentioned
Ans. d
17. Point out the wrong statement.
a) k-means clustering is a method of vector quantization
b) k-means clustering aims to partition n observations into k clusters
c) k-nearest neighbor is same as k-means
d) None of the mentioned
Ans. c
18. Which of the following combination is incorrect?
a) Continuous – euclidean distance
b) Continuous – correlation similarity
c) Binary – manhattan distance
d) None of the mentioned
Ans. d
19. Which of the following can act as possible termination conditions in K-Means?
1. For a fixed number of iterations.
2. Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum.
3. Centroids do not change between successive iterations.
4. Terminate when RSS falls below a threshold.
Options:
a. 1, 3 and 4
b. 1, 2 and 3
c. 1, 2 and 4
d. All of the above
Ans. d
20. A collection of one or more items is called as _____
(a)
Itemset
(b)
Support
(c)
Confidence
(d)
Support Count
Ans. a
21. Frequency of occurrence of an itemset is called as _____
(a)
Support
(b)
Confidence
(c)
Support Count
(d)
Rules
Ans. c
22. An itemset whose support is greater than or equal to a minimum support threshold is ______
(a)
Itemset
(b)
Frequent Itemset
(c)
Infrequent items
(d)
Threshold values
Ans. b
23. What techniques can be used to improve the efficiency of apriori algorithm?
(a)
Hash-based techniques
(b)
Transaction Increases
(c)
Sampling
(d)
Cleaning
Ans. a
24. What do you mean by support(A)?
(a)
Total number of transactions containing A
(b)
Total Number of transactions not containing A
(c)
Number of transactions containing A / Total number of transactions
(d)
Number of transactions not containing A / Total number of transactions
Ans. c
25. Which of the following is the direct application of frequent itemset mining?
(a)
Social Network Analysis
(b)
Market Basket Analysis
(c)
Outlier Detection
(d)
Intrusion Detection
Ans. b
26. When do you consider an association rule interesting?
(a)
If it only satisfies min_support
(b)
If it only satisfies min_confidence
(c)
If it satisfies both min_support and min_confidence
(d)
There are other measures to check so
Ans. c
27. What is the relation between a candidate and frequent itemsets?
(a)
A candidate itemset is always a frequent itemset
(b)
A frequent itemset must be a candidate itemset
(c)
No relation between these two
(d)
Strong relation with transactions
Ans. b
28. Which of the following is not a frequent pattern mining algorithm?
(a)
Apriori
(b)
FP growth
(c)
Decision trees
(d)
Eclat
Ans. c
29. Which algorithm requires fewer scans of data?
(a)
Apriori
(b)
FP Growth
(c)
Naive Bayes
(d)
Decision Trees
Ans. b
30. For the question given below consider the data Transactions :
I1, I2, I3, I4, I5, I6
I7, I2, I3, I4, I5, I6
I1, I8, I4, I5
I1, I9, I10, I4, I6
I10, I2, I4, I11, I5
With support as 0.6 find all frequent itemsets?
(a)
<I1>, <I2>, <I4>, <I5>, <I6>, <I1, I4>, <I2, I4>, <I2, I5>, <I4, I5>, <I4, I6>, <I2, I4, I5>
(b)
<I2>, <I4>, <I5>, <I2, I4>, <I2, I5>, <I4, I5>, <I2, I4, I5>
(c)
<I11>, <I4>, <I5>, <I6>, <I1, I4>, <I5, I4>, <I11, I5>, <I4, I6>, <I2, I4, I5>
(d)
<I1>, <I4>, <I5>, <I6>
Ans. a
31. What will happen if support is reduced?
(a)
Number of frequent itemsets remains the same
(b)
Some itemsets will add to the current set of frequent itemsets
(c)
Some itemsets will become infrequent while others will become frequent
(d)
Can not say
Ans. b
32. What is association rule mining?
(a)
Same as frequent itemset mining
(b)
Finding of strong association rules using frequent itemsets
(c)
Using association to analyze correlation rules
(d)
Finding Itemsets for future trends
Ans. b
33. What does FP growth algorithm do?
a. It mines all frequent patterns through pruning rules with lesser support b. It mines all frequent patterns through pruning rules with higher support c. It mines all frequent patterns by constructing a FP tree d. All of these
Ans. c
34. Which technique finds the frequent itemsets in just two database scans?
a. Patitioning b. sampling c. hashing d. None of these
Ans. a
35. Which of the following is true?
a. Both apriori and FP-Growth uses horizontal data format b. Both apriori and FP-Growth uses vertical data format c. Both a and b d. None of these
Ans. a
36. What is the principle on which Apriori algorithm work?
a. If a rule is infrequent, its specialized rules are also infrequent b. If a rule is infrequent, its generalized rules are also infrequent c. Both a and b d. None of these
Ans. a
37. What are closed frequent itemsets?
a. A closed itemset b. A frequent itemset c. An itemset which is both closed and frequent d. None of these
Ans. c
38. What are maximal frequent itemsets?
A frequent item set whose no super-itemset is frequent
A frequent itemset whose super-itemset is also frequent
Both a and b
None of these
Ans. a
39. What is frequent pattern growth?
a. Same as frequent itemset mining b. Use of hashing to make discovery of frequent itemsets more efficient c. Mining of frequent itemsets without candidate generation d. None of these
Ans. c
40. When is sub-itemset pruning done?
a. A frequent itemset ‘P’ is a proper subset of another frequent itemset ‘Q’ b. Support (P) = Support(Q) c. When both a and b is true
d. When a is true and b is not
Ans. c
41. The apriori algorithm works in a ..and ..fashion?
a. top-down and depth-first b. top-down and breath-first c. bottom-up and depth-first d. bottom-up and breath-first
Ans. d
42. In association rule mining the generation of the frequent itermsets is the computational intensive step.
a. TRUE b. FALSE c. Both a and b d. None of these
Ans. a
43. The number of iterations in apriori __
a. increases with the size of the data b. decreases with the increase in size of the data c. increases with the size of the maximum frequent set d. decreases with increase in size of the maximum frequent set
Ans. c
44. Which of the following are interestingness measures for association rules?
a. recall b. lift c. accuracy d. compactness
Ans. b
45. In Apriori algorithm, if 1 item-sets are 100, then the number of candidate 2 item-sets are
a. 100 b. 4950 c. 200 d. 5000
Ans. b
46. Significant Bottleneck in the Apriori algorithm is
a. Finding frequent itemsets b. pruning c. Candidate generation d. Number of iterations
Ans. c
47. Which Association Rule would you prefer
a. High support and medium confidence b. High support and low confidence c. Low support and high confidence d. Low support and low confidence
Ans. c
48. The apriori property means
a. If a set cannot pass a test, its supersets will also fail the same test b. To decrease the efficiency, do level-wise generation of frequent item sets c. To improve the efficiency, do level-wise generation of frequent item sets d. If a set can pass a test, its supersets will fail the same test
Ans. a
49. If an item set ‘XYZ’ is a frequent item set, then all subsets of that frequent item set are
a. undefined b. not frequent c. frequent d. cant say
Ans. c
50. To determine association rules from frequent item sets
a. Only minimum confidence needed b. Neither support not confidence needed c. Both minimum support and confidence are needed d. Minimum support is needed
Ans. c
51. If {A,B,C,D} is a frequent itemset, candidate rules which is not possible is
a. C –> A b. D –> ABCD c. A –> BC d. B –> ADC
Ans. b
UNIT - 5
1. Who was the developer of Hadoop language?
A. Apache Software Foundation B. Hadoop Software Foundation C. Sun Microsystems D. Bell Labs
Ans : A
Explanation: Hadoop Developed by: Apache Software Foundation.
2. The Hadoop language written in which language?
A. C B. C++ C. Java D. Python
Ans : C
Explanation: The hadoop language Written in: Java.
3. What was the Initial release date of hadoop?
A. 1st April 2007 B. 1st April 2006 C. 1st April 2008 D. 1st April 2005
Ans : B
Explanation: Initial release: April 1, 2006; 13 years ago.
4. What license is Hadoop distributed under?
A. Apache License 2.1 B. Apache License 2.2 C. Apache License 2.0 D. Apache License 1.0
Ans : C
Explanation: Hadoop is Open Source, released under Apache 2 license.
5. IBM and ________ have announced a major initiative to use Hadoop to support university courses in distributed computer programming.
A. Google B. Apple C. Facebook D. Microsoft
Ans : A
Explanation: Google and IBM Announce University Initiative to Address Internet-Scale.
6. On which platfrm hadoop langauge runs?
A. Bare metal B. Debian C. Cross-platform D. Unix-Like
Ans : C
Explanation: Hadoop has support for cross platform operating system.
7. Which of the following is not Features Of Hadoop?
A. Suitable for Big Data Analysis B. Scalability C. Robust D. Fault Tolerance
Ans : C
Explanation: Robust is is not Features Of Hadoop.
8. The MapReduce algorithm contains two important tasks, namely __________.
A. mapped, reduce B. mapping, Reduction C. Map, Reduction D. Map, Reduce
Ans : D
Explanation: The MapReduce algorithm contains two important tasks, namely Map and Reduce.
9. _____ takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs).
A. Map B. Reduce C. Both A and B D. Node
Ans : A
Explanation: Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs).
10 ______ task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples.
A. Map B. Reduce C. Node D. Both A and B
Ans : B
Explanation: Reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples.
11. In how many stages the MapReduce program executes?
A. 2 B. 3 C. 4 D. 5
Ans : B
Explanation: MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage.
12. Which of the following is used to schedules jobs and tracks the assign jobs to Task tracker?
A. SlaveNode B. MasterNode C. JobTracker D. Task Tracker
Ans : C
Explanation: JobTracker : Schedules jobs and tracks the assign jobs to Task tracker.
13. Which of the following is used for an execution of a Mapper or a Reducer on a slice of data?
A. Task B. Job C. Mapper D. PayLoad
Ans : A
Explanation: Task : An execution of a Mapper or a Reducer on a slice of data.
14. Which of the following commnd runs a DFS admin client?
A. secondaryadminnode B. nameadmin C. dfsadmin D. adminsck
Ans : C
Explanation: dfsadmin : Runs a DFS admin client.
15. Point out the correct statement.
A. MapReduce tries to place the data and the compute as close as possible B. Map Task in MapReduce is performed using the Mapper() function C. Reduce Task in MapReduce is performed using the Map() function D. None of the above
Ans : A
Explanation: This feature of MapReduce is "Data Locality".
16. Although the Hadoop framework is implemented in Java, MapReduce applications need not be written in ____________
A. C B. C# C. Java D. None of the above
Ans : C
Explanation: Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications (non JNITM based).
17. The number of maps is usually driven by the total size of ____________
A. Inputs B. Output C. Task D. None of the above
Ans : A
Explanation: Total size of inputs means the total number of blocks of the input files.
18. What is full form of HDFS?
A. Hadoop File System B. Hadoop Field System C. Hadoop File Search D. Hadoop Field search
Ans : A
Explanation: Hadoop File System was developed using distributed file system design.
19. HDFS works in a __________ fashion.
A. worker-master fashion B. master-slave fashion C. master-worker fashion D. slave-master fashion
Ans : B
Explanation: HDFS follows the master-slave architecture.
20. Which of the following are the Goals of HDFS?
A. Fault detection and recovery B. Huge datasets C. Hardware at data D. All of the above
Ans : D
Explanation: All the above option are the goals of HDFS.
21. ________ NameNode is used when the Primary NameNode goes down.
A. Rack B. Data C. Secondary D. Both A and B
Ans : C
Explanation: Secondary namenode is used for all time availability and reliability.
22. The minimum amount of data that HDFS can read or write is called a _____________.
A. Datanode B. Namenode C. Block D. None of the above
Ans : C
Explanation: The minimum amount of data that HDFS can read or write is called a Block.
23. The default block size is ______.
A. 32MB B. 64MB C. 128MB D. 16MB
Ans : B
Explanation: The default block size is 64MB, but it can be increased as per the need to change in HDFS configuration.
24. For every node (Commodity hardware/System) in a cluster, there will be a _________.
A. Datanode B. Namenode C. Block D. None of the above
Ans : A
Explanation: For every node (Commodity hardware/System) in a cluster, there will be a datanode.
25. Which of the following is not Features Of HDFS?
A. It is suitable for the distributed storage and processing. B. Streaming access to file system data. C. HDFS provides file permissions and authentication. D. Hadoop does not provides a command interface to interact with HDFS.
Ans : D
Explanation: The correct feature is Hadoop provides a command interface to interact with HDFS.
26. HDFS is implemented in _____________ language.
A. Perl B. Python C. Java D. C
Ans : C
Explanation: HDFS is implemented in Java and any computer which can run Java can host a NameNode/DataNode on it.
27. During start up, the ___________ loads the file system state from the fsimage and the edits log file.
A. Datanode B. Namenode C. Block D. ActionNode
Ans : B
Explanation: HDFS is implemented on any computer which can run Java can host a NameNode/DataNode on it.
28. Which of the following is not true about Pig?
A. Apache Pig is an abstraction over MapReduce B. Pig cannot perform all the data manipulation operations in Hadoop. C. Pig is a tool/platform which is used to analyze larger sets of data representing them as data flows. D. None of the above
Ans : B
Explanation: Pig is generally used with Hadoop; we can perform all the data manipulation operations in Hadoop using Apache Pig.
29. Which of the following is/are a feature of Pig?
A. Rich set of operators B. Ease of programming C. Extensibility D. All of the above
Ans : D
Explanation: All options are the following Features of Pig.
30. In which year apache Pig was released?
A. 2005 B. 2006
C. 2007 D. 2008
Ans : B
Explanation: In 2006, Apache Pig was developed as a research project.
31. Pig operates in mainly how many nodes?
A. 2 B. 3 C. 4 D. 5
Ans : A
Explanation: You can run Pig (execute Pig Latin statements and Pig commands) using various mode: Interactive and Batch Mode.
32. Which of the following company has developed PIG?
A. Google B. Yahoo C. Microsoft D. Apple
Ans : B
Explanation: Apache Pig was developed as a research project at Yahoo, especially to create and execute MapReduce jobs on every dataset.
33. Which of the following function is used to read data in PIG?
A. Write B. Read C. Perform D. Load
Ans : D
Explanation: PigStorage is the default load function.
34. __________ is a framework for collecting and storing script-level statistics for Pig Latin.
A. Pig Stats B. PStatistics C. Pig Statistics D. All of the above
Ans : C
Explanation: The new Pig statistics and the existing Hadoop statistics can also be accessed via the Hadoop job history file.
35. Which of the following is true statement?
A. Pig is a high level language. B. Performing a Join operation in Apache Pig is pretty simple.
C. Apache Pig is a data flow language. D. All of the above
Ans : D
Explanation: All option are true statement.
36. Which of the following will compile the Pigunit?
A. $pig_trunk ant pigunit-jar B. $pig_tr ant pigunit-jar C. $pig_ ant pigunit-jar D. $pigtr_ ant pigunit-jar
Ans : A
Explanation: The compile will create the pigunit.jar file.
37. Point out the wrong statement.
A. Pig can invoke code in language like Java Only B. Pig enables data workers to write complex data transformations without knowing Java C. Pig's simple SQL-like scripting language is called Pig Latin, and appeals to developers already familiar with scripting languages and SQL D. Pig is complete, so you can do all required data manipulations in Apache Hadoop with Pig
Ans : A
Explanation: Through the User Defined Functions(UDF) facility in Pig, Pig can invoke code in many languages like JRuby, Jython and Java.
38. Which of the following is/are INCORRECT with respect to Hive?
A. Hive provides SQL interface to process large amount of data B. Hive needs a relational database like oracle to perform query operations and store data. C. Hive works well on all files stored in HDFS D. Both A and B
Ans : B
Explanation: Hive needs a relational database like oracle to perform query operations and store data is incorrect with respect to Hive.
39. Which of the following is not a Features of HiveQL?
A. Supports joins B. Supports indexes C. Support views D. Support Transactions
Ans : D
Explanation: Support Transactions is not a Features of HiveQL.
40. Which of the following operator executes a shell command from the Hive shell?
A. | B. !
C. # D. $
Ans : B
Explanation: Exclamation operator is for execution of command.
41. Hive uses _________ for logging.
A. logj4 B. log4l C. log4i D. log4j
Ans : D
Explanation: By default Hive will use hive-log4j.default in the conf/ directory of the Hive installation.
42. HCatalog is installed with Hive, starting with Hive release is ___________
A. 0.10.0 B. 0.9.0 C. 0.11.0 D. 0.12.0
Ans : C
Explanation: hcat commands can be issued as hive commands, and vice versa.
43. _______ supports a new command shell Beeline that works with HiveServer2.
A. HiveServer2 B. HiveServer3 C. HiveServer4 D. HiveServer5
Ans : A
Explanation: The Beeline shell works in both embedded mode as well as remote mode.
44. The ________ allows users to read or write Avro data as Hive tables.
A. AvroSerde B. HiveSerde C. SqlSerde D. HiveQLSerde
Ans : A
Explanation: AvroSerde understands compressed Avro files.
45. Which of the following data type is supported by Hive?
A. map B. record C. string D. enum
Ans : D
Explanation: Hive has no concept of enums.
46. We need to store skill set of MCQs(which might have multiple values) in MCQs table, which of the following is the best way to store this information in case of Hive?
A. Create a column in MCQs table of STRUCT data type B. Create a column in MCQs table of MAP data type C. Create a column in MCQs table of ARRAY data type D. As storing multiple values in a column of MCQs itself is a violation
Ans : C
Explanation: Option C is correct.
47. Letsfindcourse is generating huge amount of data. They are generating huge amount of sensor data from different courses which was unstructured in form. They moved to Hadoop framework for storing and analyzing data. What technology in Hadoop framework, they can use to analyse this unstructured data?
A. MapReduce programming B. Hive C. RDBMS D. None of the above Ans : A
Explanation: MapReduce programming is the right answer.
48. Which of the following is correct statement?
A. HBase is a distributed column-oriented database B. Hbase is not open source C. Hbase is horizontally scalable. D. Both A and C
Ans : D
Explanation: HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an open-source project and is horizontally scalable.
49. Which of the following is not a feature of Hbase?
A. HBase is lateral scalable. B. It has automatic failure support. C. It provides consistent read and writes. D. It has easy java API for client.
Ans : A
Explanation: Option A is incorrect because HBase is linearly scalable.
50. When did HBase was first released?
A. April 2007 B. March 2007 C. February 2007 D. May 2007 Ans : C
Explanation: HBase was first released in February 2007. Later in January 2008, HBase became a sub project of Apache Hadoop.
51. Apache HBase is a non-relational database modeled after Google's _________
A. BigTop B. Bigtable C. Scanner D. FoundationDB
Ans : B
Explanation: Bigtable acts up on Google File System, likewise Apache HBase works on top of Hadoop and HDFS.
52. HBaseAdmin and ____________ are the two important classes in this package that provide DDL functionalities.
A. HTableDescriptor B. HDescriptor C. HTable D. HTabDescriptor
Ans : A
Explanation: Java provides an Admin API to achieve DDL functionalities through programming
53. which of the following is correct statement?
A. HBase provides fast lookups for larger tables. B. It provides low latency access to single rows from billions of records C. HBase is a database built on top of the HDFS. D. All of the above
Ans : D
Explanation: All the options are correct.
54. HBase supports a ____________ interface via Put and Result.
A. bytes-in/bytes-out B. bytes-in C. bytes-out D. None of the above
Ans : A
Explanation: Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
55. Which command is used to disable all the tables matching the given regex?
A. remove all B. drop all C. disable_all D. None of the above
Ans : C
Explanation: The syntax for disable_all command is as follows : hbase > disable_all 'r.*'
56. _________ is the main configuration file of HBase.
A. hbase.xml B. hbase-site.xml C. hbase-site-conf.xml D. hbase-conf.xml
Ans : B
Explanation: Set the data directory to an appropriate location by opening the HBase home folder in /usr/local/HBase.
57. which of the following is incorrect statement?
A. HBase is built for wide tables B. Transactions are there in HBase. C. HBase has de-normalized data. D. HBase is good for semi-structured as well as structured data.
Ans : B
Explanation: No transactions are there in HBase.
58. R was created by?
A. Ross Ihaka B. Robert Gentleman C. Both A and B D. Ross Gentleman
Ans : C
Explanation: R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team.
59. R allows integration with the procedures written in the?
A. C B. Ruby C. Java D. Basic
Ans : A
Explanation: R allows integration with the procedures written in the C, C++, .Net, Python or FORTRAN languages for efficiency.
60. R is free software distributed under a GNU-style copy left, and an official part of the GNU project called?
A. GNU A B. GNU S C. GNU L D. GNU R
Ans : B
Explanation: R is free software distributed under a GNU-style copy left, and an official part of the GNU project called GNU S.
61. R made its first appearance in?
A. 1992 B. 1995 C. 1993 D. 1994
Ans : C
Explanation: R made its first appearance in 1993.
62. Which of the following is true about R?
A. R is a well-developed, simple and effective programming language B. R has an effective data handling and storage facility C. R provides a large, coherent and integrated collection of tools for data analysis. D. All of the above
Ans : D
Explanation: All of the above statement are true.
63. Point out the wrong statement?
A. Setting up a workstation to take full advantage of the customizable features of R is a straightforward thing B. q() is used to quit the R program C. R has an inbuilt help facility similar to the man facility of UNIX D. Windows versions of R have other optional help systems also
Ans : B
Explanation: help command is used for knowing details of particular command in R.
64. Command lines entered at the console are limited to about ________ bytes
A. 4095 B. 4096 C. 4097 D. 4098
Ans : A
Explanation: Elementary commands can be grouped together into one compound expression by braces (‘{’ and ‘}’).
65. R language is a dialect of which of the following languages?
A. s B. c C. sas D. matlab
Ans : A
Explanation: The R language is a dialect of S which was designed in the 1980s. Since the early 90’s the life of the S language has gone down a rather winding path. The scoping rules for R are the main feature that makes it different from the original S language.
66. How many atomic vector types does R have?
A. 3 B. 4 C. 5 D. 6
Ans : D
Explanation: R language has 6 atomic data types. They are logical, integer, real, complex, string (or character) and raw. There is also a class for “raw” objects, but they are not commonly used directly in data analysis.
67. R files has an extension _____.
A. .S B. .RP C. .R D. .SP
Ans : C
Explanation: All R files have an extension .R. R provides a mechanism for recalling and re-executing previous commands. All S programmed files will have an extension .S. But R has many functions than S.
68. What will be output for the following code?
v <- TRUE
print(class(v))
A. logical B. Numeric C. Integer D. Complex
Ans : A
Explanation: It produces the following result : [1] ""logical""
69. What will be output for the following code?
v <- ""TRUE""
print(class(v))
A. logical B. Numeric C. Integer D. Character
Ans : D
Explanation: It produces the following result : [1] ""character""
70. In R programming, the very basic data types are the R-objects called?
A. Lists B. Matrices C. Vectors D. Arrays
Ans : C
Explanation: In R programming, the very basic data types are the R-objects called vectors
71. Data Frames are created using the?
A. frame() function B. data.frame() function C. data() function D. frame.data() function
Ans : B
Explanation: Data Frames are created using the data.frame() function
72. Which functions gives the count of levels?
A. level B. levels C. nlevels D. nlevel
Ans : C
Explanation: Factors are created using the factor() function. The nlevels functions gives the count of levels.
73. Point out the correct statement?
A. Empty vectors can be created with the vector() function B. A sequence is represented as a vector but can contain objects of different classes C. "raw” objects are commonly used directly in data analysis D. The value NaN represents undefined value
Ans : A
Explanation: A vector can only contain objects of the same class.
74. What will be the output of the following R code?
> x <- vector(""numeric"", length = 10)
> x
A. 1 0 B. 0 0 0 0 0 0 0 0 0 0 C. 0 1 D. 0 0 1 1 0 1 1 0
Ans : B
Explanation: You can also use the vector() function to initialize vectors.
75. What will be output for the following code?
> sqrt(-17)
A. -4.02 B. 4.02 C. 3.67 D. NAN
Ans : D
Explanation: These metadata can be very useful in that they help to describe the object.
76. _______ function returns a vector of the same size as x with the elements arranged in increasing order.
A. sort() B. orderasc() C. orderby() D. sequence()
Ans : A
Explanation: There are other more flexible sorting facilities available like order() or sort.list() which produce a permutation to do the sorting.
77. What will be the output of the following R code?
> m <- matrix(nrow = 2, ncol = 3)
> dim(m)
A. 3 3 B. 3 2 C. 2 3 D. 2 2
Ans : C
Explanation: Matrices are constructed column-wise.
78. Which loop executes a sequence of statements multiple times and abbreviates the code that manages the loop variable?
A. for B. while C. do-while D. repeat
Ans : D
Explanation: repeat loop : Executes a sequence of statements multiple times and abbreviates the code that manages the loop variable.
79. Which of the following true about for loop?
A. Repeats a statement or group of statements while a given condition is true. It tests the condition before executing the loop body. B. it tests the condition at the end of the loop body. C. Both A and B D. None of the above
Ans : B
Explanation: for loop : Like a while statement, except that it tests the condition at the end of the loop body.
80. Which statement simulates the behavior of R switch?
A. Next B. Previous C. break D. goto
Ans : A
Explanation: The next statement simulates the behavior of R switch.
81. In which statement terminates the loop statement and transfers execution to the statement immediately following the loop?
A. goto B. switch C. break D. label
Ans : C
Explanation: Break : Terminates the loop statement and transfers execution to the statement immediately following the loop.
82. Point out the wrong statement?
A. Multi-line expressions with curly braces are just not that easy to sort through when working on the command line B. lappy() loops over a list, iterating over each element in that list C. lapply() does not always returns a list D. You cannot use lapply() to evaluate a function multiple times each with a different argument
Ans : C
Explanation: lapply() always returns a list, regardless of the class of the input.
83. The mapply() function is a multivariate apply of sorts which applies a function in parallel over a set of arguments.
A. TRUE B. FALSE C. Can be true or false D. Can not say
Ans : A
Explanation: True, The mapply() function is a multivariate apply of sorts which applies a function in parallel over a set of arguments.
84 Which of the following is valid body of split function?
A. function (x, f) B. function (x, f, drop = FALSE, …) C. function (x, drop = FALSE, …) D. function (drop = FALSE, …)
Ans : B
Explanation: x is a vector (or list) or data frame
85. Which of the following character skip during execution?
v <- LETTERS[1:6]
for ( i in v) {
if (i == ""D"") {
next
}
print(i)
}
A. A B. B C. C D. D
Ans : D
Explanation: When the above code is compiled and executed, it produces the following result : [1] ""A"" [1] ""B"" [1] ""C"" [1] ""E"" [1] ""F""
86. What will be output for the following code?
v <- LETTERS[1]
for ( i in v) {
print(v)
}
A. A B. A B C. A B C D. A B C D
Ans : A
Explanation: The output for the following code : [1] ""A""
87. What will be output for the following code?
v <- LETTERS[""A""]
for ( i in v) {
print(v)
}
A. A B. NAN C. NA D. Error
Ans : C
Explanation: The output for the following code : [1] NA
88. An R function is created by using the keyword?
A. fun B. function C. declare D. extends
Ans : B
Explanation: An R function is created by using the keyword function.
89. What will be output for the following code?
print(mean(25:82))
A. 1526 B. 53.5 C. 50.5 D. 55
Ans : B
Explanation: The code will find mean of numbers from 25 to 82 that is 53.5
90. Point out the wrong statement?
A. Functions in R are “second class objects” B. The writing of a function allows a developer to create an interface to the code, that is explicitly specified with a set of parameters
C. Functions provides an abstraction of the code to potential users D. Writing functions is a core activity of an R programmer
Ans : A
Explanation: Functions in R are “first class objects”, which means that they can be treated much like any other R object.
91. What will be output for the following code?
> paste("a", "b", se = ":")
A. a+b B. a:b C. a-b D. None of the above
Ans : D
Explanation: With the paste() function, the arguments sep and collapse must be named explicitly and in full if the default values are not going to be used.
92. Which function in R language is used to find out whether the means of 2 groups are equal to each other or not?
A. f.tests () B. l.tests () C. t.tests () D. p.tests () View Answer Ans : C
Explanation: t.tests () function in R language is used to find out whether the means of 2 groups are equal to each other. It is not used most commonly in R. It is used in some specific conditions.
93. What will be the output of log (-5.8) when executed on R console?
A. NA B. NAN C. 0.213 D. Error View Answer Ans : B
Explanation: Executing the above on R console or terminal will display a warning sign that NaN (Not a Number) will be produced in R console because it is not possible to take a log of a negative number(-).
94. Which function is preferred over sapply as vapply allows the programmer to specific the output type?
A. Lapply B. Japply C. Vapply D. Zapply View Answer Ans : C
Explanation: Vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use. simplify2array() is the utility called from sapply() when simplify is not false and is similarly called from mapply().
95. How will you check if an element is present in a vector?
A. Match() B. Dismatch() C. Mismatch() D. Search()
Ans : A
Explanation: It can be done using the match () function- match () function returns the first appearance of a particular element. The other way is to use %in% which returns a Boolean value either true or false.
96. You can check to see whether an R object is NULL with the _________ function.
A. is.null() B. is.nullobj() C. null() D. as.nullobj()
Ans : A
Explanation: It is sometimes useful to allow an argument to take the NULL value, which might indicate that the function should take some specific action.
97. In the base graphics system, which function is used to add elements to a plot?
A. Boxplot() B. Text() C. Treat() D. Both A and B
Ans : D
Explanation: In the base graphics system, boxplot or text function is used to add elements to a plot.
98. Which of the following syntax is used to install forecast package?
A. install.pack("forecast") B. install.packages("cast") C. install.packages("forecast") D. install.pack("forecastcast")
Ans : C
Explanation: forecast is used for time series analysis
99. Which splits a data frame and returns a data frame?
A. apply B. ddply C. stats D. plyr
Ans : B
Explanation: ddply splits a data frame and returns a data frame.
100. Which of the following is an R package for the exploratory analysis of genetic and genomic data?
A. adeg B. adegenet C. anc D. abd
Ans : B
Explanation: This package contains Classes and functions for genetic data analysis within the multivariate framework.
101. Which of the following contains functions for processing uniaxial minute-to-minute accelerometer data?
A. accelerometry B. abc C. abd D. anc
Ans : A
Explanation: This package contains a collection of functions that perform operations on time-series accelerometer data, such as identify non-wear time, flag minutes that are part of an activity about, and find the maximum 10-minute average count value.
102. ______ Uses Grieg-Smith method on 2 dimensional spatial data.
A. G.A. B. G2db C. G.S. D. G1DBN
Ans : C
Explanation: The function returns a GriegSmith object which is a matrix with block sizes, sum of squares for each block size as well as mean sums of squares. G1DBN is a package performing Dynamic Bayesian Network Inference.
103. Which of the following package provide namespace management functions not yet present in base R?
A. stringr B. nbpMatching C. messagewarning D. namespace
Ans : D
Explanation: The package namespace is one of the most confusing parts of building a package. nbpMatching contains functions for non-bipartite optimal matching.
104. What will be the output of the following R code?
install.packages(c("devtools", "roxygen2"))
A. Develops the tools B. Installs the given packages C. Exits R studio D. Nothing happens
Ans : B
Explanation: Make sure you have the latest version of R and then run the above code to get the packages you’ll need. It installs the given packages. Confirm that you have a recent version of RStudio.
105. A bundled package is a package that’s been compressed into a ______ file.
A. Double B. Single C. Triple D. No File
Ans : B
Explanation: A bundled package is a package that’s been compressed into a single file. A source package is just a directory with components like R/, DESCRIPTION, and so on.
106. .library() is not useful when developing a package since you have to install the package first.
A. TRUE B. FALSE C. Can be true or false D. Can not say
Ans : A
Explanation: library() is not useful when developing a package since you have to install the package first. A library is a simple directory containing installed packages.
107. DESCRIPTION uses a very simple file format called DCF.
A. TRUE B. FALSE C. Can be true or false D. Can not say
Ans : A
Explanation: DESCRIPTION uses a very simple file format called DCF, the Debian control format. When you first start writing packages, you’ll mostly use these metadata to record what packages are needed to run your package.
108. HDFS Stores how much data in each clusters that can be scaled at any time? 1. 32 2. 64 3. 128 4. 256 Show Answer 128
109. _____ provides performance through distribution of data and fault tolerance through replication 1. HDFS 2. PIG 3. HIVE 4. HADOOP Show Answer HDFS
110. ______ is a programming model for writing applications that can process Big Data in parallel on multiple nodes. 1. HDFS 2. MAP REDUCE 3. HADOOP 4. HIVE Show Answer MAP REDUCE
111. ____ takes the grouped key-value paired data as input and runs a Reducer function on each one of them. 1. MAPPER 2. REDUCER 3. COMBINER 4. PARTITIONER Show Answer REDUCER
112. ____ is a type of local Reducer that groups similar data from the map phase into identifiable sets. 1. MAPPER 2. REDUCER 3. COMBINER 4. PARTITIONER Show Answer COMBINER
113. While Installing Hadoop how many xml files are edited and list them ? 1. core-site.xml 2. hdfs-site.xml 3. mapred.xml 4. yarn.xml Show Answer core-site.xml
**********Module - 1 (Introduction)**********
1.According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big d ata technologies like Hadoop? (A). Big data management and data mining (B). Data warehousing and business intelligence (C). Management of Hadoop clusters (D). Collecting and storing unstructured data Answer -A
2.What are the main components of Big Data? (A). MapReduce (B). HDFS (C). YARN (D). All of these Answer -D
3.What are the different features of Big Data Analytics? (A). Open-Source (B). Scalability (C). Data Recovery (D). All the above Answer -D
4.According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big d ata technologies like Hadoop? (A). Big data management and data mining (B). Data warehousing and business intelligence (C). Management of Hadoop clusters (D). Collecting and storing unstructured data Answer -A
5.What are the four V’s of Big Data? (A). Volume (B). Velocity (C). Variety (D). All the above Answer -D
6.IBM and ________ have announced a major initiative to use Hadoop to support university courses in distributed c omputer programming. (A). Google Latitude (B). Android (operating system) (C). Google Variations (D). Google Answer: d Explanation: Google and IBM Announce University Initiative to Address Internet-Scale.
7.Point out the correct statement. (A). Hadoop is an ideal environment for extracting and transforming small volumes of data (B). Hadoop stores data in HDFS and supports data compression/decompression (C). The Giraph framework is less useful than a MapReduce job to solve graph and machine learning (D). None of the mentioned
Answer: b Explanation: Data compression can be achieved using compression algorithms like bzip2, gzip, LZO, etc. Different algorithms can be used in different scenarios based on their capabilities.
8.What license is Hadoop distributed under? (A). Apache License 2.0 (B). Mozilla Public License (C). Shareware (D). Commercial Answer: a Explanation: Hadoop is Open Source, released under Apache 2 license.
9.Sun also has the Hadoop Live CD ________ project, which allows running a fully functional Hadoop cluster using a live CD. (A). OpenOffice.org (B). OpenSolaris (C). GNU (D). Linux Answer: b Explanation: The OpenSolaris Hadoop LiveCD project built a bootable CD-ROM image.
10.Which of the following genres does Hadoop produce? (A). Distributed file system (B). JAX-RS (C). Java Message Service (D). Relational Database Management System Answer: a Explanation: The Hadoop Distributed File System (HDFS). is designed to store very large data sets reliably, and to s tream those data sets at high bandwidth to the user.
11.What was Hadoop written in? (A). Java (software platform) (B). Perl (C). Java (programming language) (D). Lua (programming language) Answer: c Explanation: The Hadoop framework itself is mostly written in the Java programming language, with some native co de in C and command-line utilities written as shell-scripts.
12.Which of the following platforms does Hadoop run on? (A). Bare metal (B). Debian (C). Cross-platform (D). Unix-like Answer: c Explanation: Hadoop has support for cross-platform operating system.
13.Hadoop achieves reliability by replicating the data across multiple hosts and hence does not require ________ sto rage on hosts. (A). RAID (B). Standard RAID levels (C). ZFS (D). Operating system
Answer: a Explanation: With the default replication value, 3, data is stored on three nodes: two on the same rack, and one on a different rack.
14.Above the file systems comes the ________ engine, which consists of one Job Tracker, to which client applicatio ns submit MapReduce jobs. (A). MapReduce (B). Google (C). Functional programming (D). Facebook Answer: a Explanation: MapReduce engine uses to distribute work around a cluster.
15.The Hadoop list includes the HBase database, the Apache Mahout ________ system, and matrix operations. (A). Machine learning (B). Pattern recognition (C). Statistical classification (D). Artificial intelligence Answer: a Explanation: The Apache Mahout project’s goal is to build a scalable machine learning tool.
16.As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, incl uding _______________ (A). Improved data storage and information retrieval (B). Improved extract, transform and load features for data integration (C). Improved data warehousing functionality (D). Improved security, workload management, and SQL support Answer: d Explanation: Adding security to Hadoop is challenging because all the interactions do not follow the classic client-se rver pattern.
17.Point out the correct statement. (A). Hadoop do need specialized hardware to process the data (B). Hadoop 2.0 allows live stream processing of real-time data (C). In Hadoop programming framework output files are divided into lines or records (D). None of the mentioned Answer: b Explanation: Hadoop batch processes data distributed over a number of computers ranging in 100s and 1000s.
18.According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop? (A). Big data management and data mining (B). Data warehousing and business intelligence (C). Management of Hadoop clusters (D). Collecting and storing unstructured data Answer: a Explanation: Data warehousing integrated with Hadoop would give a better understanding of data.
19.Hadoop is a framework that works with a variety of related tools. Common cohorts include ____________ (A). MapReduce, Hive and HBase (B). MapReduce, MySQL and Google Apps (C). MapReduce, Hummer and Iguana (D). MapReduce, Heron and Trumpet Answer: a
Explanation: To use Hive with HBase you’ll typically want to launch two clusters, one to run HBase and the other to run Hive.
20.Point out the wrong statement. (A). Hardtop processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabyt es of data (B). Hadoop uses a programming model called “MapReduce”, all the programs should confirm to this model in orde r to work on Hadoop platform (C). The programming model, MapReduce, used by Hadoop is difficult to write and test (D). All of the mentioned Answer: c Explanation: The programming model, MapReduce, used by Hadoop is simple to write and test.
21.What was Hadoop named after? (A). Creator Doug Cutting’s favorite circus act (B). Cutting’s high school rock band (C). The toy elephant of Cutting’s son (D). A sound Cutting’s laptop made during Hadoop development Answer: c Explanation: Doug Cutting, Hadoop creator, named the framework after his child’s stuffed toy elephant.
22.All of the following accurately describe Hadoop, EXCEPT ____________ (A). Open-source (B). Real-time (C). Java-based (D). Distributed computing approach Answer: b Explanation: Apache Hadoop is an open-source software framework for distributed storage and distributed processin g of Big Data on clusters of commodity hardware.
23.__________ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data. (A). MapReduce (B). Mahout (C). Oozie (D). All of the mentioned Answer: a Explanation: MapReduce is a programming model and an associated implementation for processing and generating l arge data sets with a parallel, distributed algorithm.
24.__________ has the world’s largest Hadoop cluster. (A). Apple (B). Datamatics (C). Facebook (D). None of the mentioned Answer: c Explanation: Facebook has many Hadoop clusters, the largest among them is the one that is used for Data warehousi ng.
25.Facebook Tackles Big Data With _______ based on Hadoop. (A). ‘Project Prism’ (B). ‘Prism’ (C). ‘Project Big’ (D). ‘Project Data’
Answer: a Explanation: Prism automatically replicates and moves data wherever it’s needed across a vast network of computin g facilities.
26.________ is a platform for constructing data flows for extract, transform, and load (ETL). processing and analysi s of large datasets. (A). Pig Latin (B). Oozie (C). Pig (D). Hive Answer: c Explanation: Apache Pig is a platform for analyzing large data sets that consists of a high-level language for express ing data analysis programs.
27.Point out the correct statement. (A). Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data (B). Hive is a relational database with SQL support (C). Pig is a relational database with SQL support (D). All of the mentioned Answer: a Explanation: Hive is a SQL-based data warehouse system for Hadoop that facilitates data summarization, ad hoc que ries, and the analysis of large datasets stored in Hadoop-compatible file systems.
28._________ hides the limitations of Java behind a powerful and concise Clojure API for Cascading. (A). Scalding (B). HCatalog (C). Cascalog (D). All of the mentioned Answer: c Explanation: Cascalog also adds Logic Programming concepts inspired by Datalog. Hence the name “Cascalog” is a contraction of Cascading and Datalog.
29.Hive also support custom extensions written in ____________ (A). C# (B). Java (C). C (D). C++ Answer: b Explanation: Hive also support custom extensions written in Java, including user-defined functions (UDFs). and seri alize r-deserializers for reading and optionally writing custom formats.
30.Point out the wrong statement. (A). Elastic MapReduce (EMR). is Facebook’s packaged Hadoop offering (B). Amazon Web Service Elastic MapReduce (EMR). is Amazon’s packaged Hadoop offering (C). Scalding is a Scala API on top of Cascading that removes most Java boilerplate (D). All of the mentioned Answer: a Explanation: Rather than building Hadoop deployments manually on EC2 (Elastic Compute Cloud). clusters, users c an spin up fully configured Hadoop installations using simple invocation commands, either through the AWS Web Console or through command-line tools.
31.________ is the most popular high-level Java API in Hadoop Ecosystem (A). Scalding
(B). HCatalog (C). Cascalog (D). Cascading Answer: d Explanation: Cascading hides many of the complexities of MapReduce programming behind more intuitive pipes an d data flow abstractions.
32.___________ is general-purpose computing model and runtime system for distributed data analytics. (A). Mapreduce (B). Drill (C). Oozie (D). None of the mentioned Answer: a Explanation: Mapreduce provides a flexible and scalable foundation for analytics, from traditional reporting to leadi ng-edge machine learning algorithms.
33.The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to __ __________ (A). SQL (B). JSON (C). XML (D). All of the mentioned Answer: a Explanation: Pig Latin, in essence, is designed to fill the gap between the declarative style of SQL and the low-level procedural style of MapReduce.
34._______ jobs are optimized for scalability but not latency. (A). Mapreduce (B). Drill (C). Oozie (D). Hive Answer: d Explanation: Hive Queries are translated to MapReduce jobs to exploit the scalability of MapReduce.
35.______ is a framework for performing remote procedure calls and data serialization. (A). Drill (B). BigTop (C). Avro (D). Chukwa Answer: c Explanation: In the context of Hadoop, Avro can be used to pass data from one program or language to another.
********** Join:- https://t.me/AKTU_Notes_Books_Quantum **********
**********Module - 2 (Hadoop HDFS & Map Reduce)**********
1.A ________ serves as the master and there is only one NameNode per cluster. (A). Data Node (B). NameNode (C). Data block (D). Replication
Answer: b Explanation: All the metadata related to HDFS including the information about data nodes, files stored on HDFS, an d Replication, etc. are stored and maintained on the NameNode.
2.Point out the correct statement. (A). DataNode is the slave/worker node and holds the user data in the form of Data Blocks (B). Each incoming file is broken into 32 MB by default (C). Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance (D). None of the mentioned Answer: a Explanation: There can be any number of DataNodes in a Hadoop Cluster.
3.HDFS works in a __________ fashion. (A). master-worker (B). master-slave (C). worker/slave (D). all of the mentioned Answer: a Explanation: NameNode servers as the master and each DataNode servers as a worker/slave
4.________ NameNode is used when the Primary NameNode goes down. (A). Rack (B). Data (C). Secondary (D). None of the mentioned Answer: c Explanation: Secondary namenode is used for all time availability and reliability.
5.Point out the wrong statement. (A). Replication Factor can be configured at a cluster level (Default is set to 3). and also at a file level (B). Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode (C). User data is stored on the local file system of DataNodes (D). DataNode is aware of the files to which the blocks stored on it belong to Answer: d Explanation: NameNode is aware of the files to which the blocks stored on it belong to.
6.Which of the following scenario may not be a good fit for HDFS? (A). HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file (B). HDFS is suitable for storing data related to applications requiring low latency data access (C). HDFS is suitable for storing data related to applications requiring low latency data access (D). None of the mentioned Answer: a Explanation: HDFS can be used for storing archive data since it is cheaper as HDFS allows storing the data on low c ost commodity hardware while ensuring a high degree of fault-tolerance.
7.The need for data replication can arise in various scenarios like ____________ (A). Replication Factor is changed (B). DataNode goes down (C). Data Blocks get corrupted (D). All of the mentioned Answer: d Explanation: Data is replicated across different DataNodes to ensure a high degree of fault-tolerance.
8.________ is the slave/worker node and holds the user data in the form of Data Blocks.
(A). DataNode (B). NameNode (C). Data block (D). Replication Answer: a Explanation: A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more than one DataNo de, with data replicated across them.
9.HDFS provides a command line interface called __________ used to interact with HDFS. (A). “HDFS Shell” (B). “FS Shell” (C). “DFS Shell” (D). None of the mentioned Answer: b Explanation: The File System (FS). shell includes various shell-like commands that directly interact with the Hadoo p Distributed File System (HDFS).
10.HDFS is implemented in _____________ programming language. (A). C++ (B). Java (C). Scala (D). None of the mentioned Answer: b Explanation: HDFS is implemented in Java and any computer which can run Java can host a NameNode/DataNode on it.
11.For YARN, the ___________ Manager UI provides host and port information. (A). Data Node (B). NameNode (C). Resource (D). Replication Answer: c Explanation: All the metadata related to HDFS including the information about data nodes, files stored on HDFS, an d Replication, etc. are stored and maintained on the NameNode.
12.Point out the correct statement. (A). The Hadoop framework publishes the job flow status to an internally running web server on the master nodes of the Hadoop cluster (B). Each incoming file is broken into 32 MB by default (C). Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance (D). None of the mentioned Answer: a Explanation: The web interface for the Hadoop Distributed File System (HDFS). shows information about the Name Node itself.
13.For ________ the HBase Master UI provides information about the HBase Master uptime. (A). HBase (B). Oozie (C). Kafka (D). All of the mentioned Answer: a Explanation: HBase Master UI provides information about the num ber of live, dead and transitional servers, logs, Z ooKeeper information, debug dumps, and thread stacks.
14.During start up, the ___________ loads the file system state from the fsimage and the edits log file. (A). DataNode (B). NameNode (C). ActionNode (D). None of the mentioned Answer: b Explanation: HDFS is implemented on any computer which can run Java can host a NameNode/DataNode on it
15.7A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker. (A). MapReduce (B). Mapper (C). TaskTracker (D). JobTracker Answer: c Explanation: TaskTracker receives the information necessary for the execution of a Task from JobTracker, Executes the Task, and Sends the Results back to JobTracker.
16.Point out the correct statement. (A). MapReduce tries to place the data and the compute as close as possible (B). Map Task in MapReduce is performed using the Mapper(). function (C). Reduce Task in MapReduce is performed using the Map(). function (D). All of the mentioned Answer: a Explanation: This feature of MapReduce is “Data Locality”.
17.___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results. (A). Maptask (B). Mapper (C). Task execution (D). All of the mentioned Answer: a Explanation: Map Task in MapReduce is performed using the Map(). function.
18._________ function is responsible for consolidating the results produced by each of the Map(). functions/tasks. (A). Reduce (B). Map (C). Reducer (D). All of the mentioned Answer: a Explanation: Reduce function collates the work and resolves the results.
19.Point out the wrong statement. (A). A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tas ks in a completely parallel manner (B). The MapReduce framework operates exclusively on <key, value> pairs (C). Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods (D). None of the mentioned Answer: d Explanation: The MapReduce framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.
20.Although the Hadoop framework is implemented in Java, MapReduce applications need not be written in ______ ______
(A). Java (B). C (C). C# (D). None of the mentioned Answer: a Explanation: Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications (non JNITM ba sed).
21.________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the red ucer. (A). Hadoop Strdata (B). Hadoop Streaming (C). Hadoop Stream (D). None of the mentioned Answer: b Explanation: Hadoop streaming is one of the most important utilities in the Apache Hadoop distribution.
22.__________ maps input key/value pairs to a set of intermediate key/value pairs. (A). Mapper (B). Reducer (C). Both Mapper and Reducer (D). None of the mentioned Answer: a Explanation: Maps are the individual tasks that transform input records into intermediate records.
23.The number of maps is usually driven by the total size of ____________ (A). inputs (B). outputs (C). tasks (D). None of the mentioned Answer: a Explanation: Total size of inputs means the total number of blocks of the input files.
24._________ is the default Partitioner for partitioning key space. (A). HashPar (B). Partitioner (C). HashPartitioner (D). None of the mentioned Answer: c Explanation: The default partitioner in Hadoop is the HashPartitioner which has a method called getPartition to parti tion.
25.Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster. (A). MapReduce (B). Map (C). Reducer (D). All of the mentioned Answer: a Explanation: In some applications, component tasks need to create and/or write to side-files, which differ from the a ctual job-output files.
********** Join:- https://t.me/AKTU_Notes_Books_Quantum **********
**********Module - 3 (NoSQL)**********
1.Following represent column in NoSQL __________. (A). Database (B). Field (C). Document (D). Collection Answer -B
2.What is the aim of NoSQL? (A). NoSQL provides an alternative to SQL databases to store textual data. (B). NoSQL databases allow storing non-structured data. (C). NoSQL is not suitable for storing structured data. (D). NoSQL is a new data format to store large datasets. Answer- D
3.__________ is a online NoSQL developed by Cloudera. (A). HCatalog (B). Hbase (C). Imphala (D). Oozie Answer-B
4.Which of the following is not a NoSQL database? (A). SQL Server (B). MongoDB (C). Cassandra (D). None of the mentioned Answer-A
5.Which of the following is a NoSQL Database Type? (A). SQL (B). Document databases (C). JSON (D). All of the mentioned Answer-B
6.Following represent column in NoSQL __________. (A). Database (B). Field (C). Document (D). Collection Answer-B
7.What is the aim of NoSQL? (A). NoSQL provides an alternative to SQL databases to store textual data. (B). NoSQL databases allow storing non-structured data. (C). NoSQL is not suitable for storing structured data. (D). NoSQL is a new data format to store large datasets. Answer-D
8.__________ is a online NoSQL developed by Cloudera.
(A). HCatalog (B). Hbase (C). Imphala (D). Oozie Answer-B
9.Which of the following is not a NoSQL database? (A). SQL Server (B). MongoDB (C). Cassandra (D). None of the mentioned Answer-A
10.Which of the following is a NoSQL Database Type? (A). SQL (B). Document databases (C). JSON (D). All of the mentioned Answer-B
11.Following represent column in NoSQL __________. (A). Database (B). Field (C). Document (D). Collection Answer-B
12.What is the aim of NoSQL? (A). NoSQL provides an alternative to SQL databases to store textual data. (B). NoSQL databases allow storing non-structured data. (C). NoSQL is not suitable for storing structured data. (D). NoSQL is a new data format to store large datasets. Answer-D
13.__________ is a online NoSQL developed by Cloudera. (A). HCatalog (B). Hbase (C). Imphala (D). Oozie Answer-B
14.Which of the following is not a NoSQL database? (A). SQL Server (B). MongoDB (C). Cassandra (D). None of the mentioned Answer-A
15.Which of the following is a NoSQL Database Type? (A). SQL (B). Document databases (C). JSON (D). All of the mentioned Answer-B
16.Following represent column in NoSQL __________. (A). Database (B). Field (C). Document (D). Collection Answer-B
17.What is the aim of NoSQL? (A). NoSQL provides an alternative to SQL databases to store textual data. (B). NoSQL databases allow storing non-structured data. (C). NoSQL is not suitable for storing structured data. (D). NoSQL is a new data format to store large datasets. Answer-D
18.__________ is a online NoSQL developed by Cloudera. (A). HCatalog (B). Hbase (C). Imphala (D). Oozie Answer-B
19.Which of the following is not a NoSQL database? (A). SQL Server (B). MongoDB (C). Cassandra (D). None of the mentioned Answer-A
20.Which of the following is a NoSQL Database Type? (A). SQL (B). Document databases (C). JSON (D). All of the mentioned Answer-B
21.Following represent column in NoSQL __________. (A). Database (B). Field (C). Document (D). Collection Answer-B
22.What is the aim of NoSQL? (A). NoSQL provides an alternative to SQL databases to store textual data. (B). NoSQL databases allow storing non-structured data. (C). NoSQL is not suitable for storing structured data. (D). NoSQL is a new data format to store large datasets. Answer-D
23.__________ is a online NoSQL developed by Cloudera. (A). HCatalog (B). Hbase (C). Imphala
(D). Oozie Answer-B
24.Which of the following is not a NoSQL database? (A). SQL Server (B). MongoDB (C). Cassandra (D). None of the mentioned Answer-A
25.Which of the following is a NoSQL Database Type? (A). SQL (B). Document databases (C). JSON (D). All of the mentioned Answer-B
********** Join:- https://t.me/AKTU_Notes_Books_Quantum **********
**********Module - 4 (Mining Data Streams)**********
1.Bloom filter was proposed by : (A). Burton morris Bloom (B). Burton Howard Bloom (C). Burton Datar Bloom (D). Burton Howrd Bloom Answer : Burton morris Bloom
2. A simple space-efficient randomized data structure for representing a set in order to support membership queries (A). Bloom Filter (B). Flajolet Martin (C). DGIM K-means Answer: (A). Bloom Filter
3.It is a web-based financial search engine that evaluates queries over real-time streaming financial data such as stoc k tickers and news feeds (A). Traderbot (B). Tradebot (C). Clickbot (D). Hyperbot Answer : (A). Traderbot
3.If the stream contains n elements with m of them unique, the FM algorithm needs a memory of . (A). O((m)) (B). O(log(m+1)) (C). O(log(m+2)) (D). O(log(m)) Answer : (D). O(log(m))
4.Calculate h(3). , given S=1,3,2,1,2,3,4,3,1,2,3,1 and h(x)=(6x+1). mod 5
(A). 19 (B). 10 (C). 15 (D). 16 Answer : (A). 19
5.According to Bloom filter principle, we should consider the potential effects of: (A). true positives (B). false negatives (C). false positives (D). true negatives Answer : (C). false positives
6.Who released a hash function named MurmurHash in 2008: (A). Datar Motwani (B). Austin Appleby (C). Marrianne Durrand (D). Burton Datar Bloom Answer : (B). Austin Appleby
8.The files on disks or records in databases need to be stored in Bloom filter as (A). keys (B). values (C). key-values (D). columns Answer : (C). key-values
9. If the stream contains n elements with m of them unique, the FM algorithm runs in ------------------ time (A). O(sq.rt(n)) (B). O(n+2) (C). O(n+1) (D). O(n) Answer : (D). O(n)
10.Given h(x). = x + 6 mod 32 , The binary value of h(4). : (A). 1011 (B). 1010 (C). 1110 (D). 1111 Answer : (B). 1010
11.Flajolet-Martin algorithm approximates the number of unique objects in a stream or a database in how many pass es? (A). n (B). 0 (C). 1 (D). 2 Answer : (C). 1
12.What is important when the input rate of facebook data is controlled externally: (A). facebook Management (B). Query Management (C). Stream Management (D). data Management
Answer: (B). Query Management
13.Which algorithmn solution does not assume uniformity? (A). DGIM (B). FM (C). SON (D). K-MEANS Answer: (A). DGIM
14. Which query operator is unable to produce an answer until it has seen its entire input: (A). Blocking query operator (B). Discrete operator (C). continuous operator (D). Continuous operator and discrete queries
15. 000101 has tail length of ------- : (A). 1 (B). 2 (C). 3 (D). 0 Answer: (D). 0
16.In FM algorithm, The probability that a given h(a). ends in at least i 0’s is (A). 1 (B). 0 (C). 2^-i (D). i Answer : (C). 2^-i
17. Probability of a false positive in Bloom Filters depends on (A). the number of hash functions (B). the density of 1’s in the array (C). the number of hash functions and the density of 1’s in the array (D). the density of 0’s in the array Answer : (C). the number of hash functions and the density of 1’s in the array
18. It is an array of bits, together with a number of hash functions (A). Bloom filter Hash Function (C). Data Stream (D). Binary input Answer : (A). Bloom filter
19. ______________query is one that is supplied to the Dsms before any relevant data arrived (A). Continuous queries and discrete queries (B). discrete queries (C). ad-hoc (D). pre-defined Answer : (D). pre-defined
20. Sorting used for query processing is an example of : (A). Blocking query operator (B). Blocking discrete operator (C). Blocking Continuous operator
(D). Continuous operator Answer : (A). Blocking query operator
********** Join:- https://t.me/AKTU_Notes_Books_Quantum **********
**********Module - 5 (Finding Similar Items & Clustering)**********
1.PCY Stands for (A). Park-Chen-Yu (B). Park-Chen-You (C). Park-Check-Yu (D). Park-Check-You Answer :a.Park-Chen-Yu
2. SON Algorithm Stands for (A). Shane,Omiecinski and Navathe (B). Savasere,Omiecinski and Navathe (C). Savare,Omienal and Navathe (D). Savasere,Omiecinski and Navarag Answer : (B). Savasere,Omiecinski and Navathe
3. Minimum Support=?,if total Transaction =5 and minimum Support=60% (A). 30 (B). 3 (C). 300 (D). 65 Answer : (B). 3
4.Minimum Support=?,if total Transaction =10 and minimum Support=60% (A). 6 (B). 0.6 (C). 10 (D). 5 Answer : (A). 6
5. How do you calculate Confidence(B -> A)? (A). Support(A B). / Support (A) (B). Support(A B). / Support (B) (C). Support(A ). / Support (B) (D). Support( B). / Support (A) Answer : (B). Support(A B). / Support (B)
**********Module - 6 (Real Time Big Data Models)**********
1.Which of the following is true? (A). graph may contain no edges and many vertices
(B). graph may contain many edges and atleast one vertices (C). graph may contain no edges and no vertices (D). graph may contain no vertices and many edges Answer : (B). graph may contain many edges and atleast one vertices
2.Social Network is defined as (A). Collection of entities that participate in the network. (B). Collection of items in store (C). Collection of vertices & edges in a graph (D). Collection of nodes in a graph Answer : (A). Collection of entities that participate in the network.
3.Which of the following is finally produced by Hierarchical Clustering? (A). final estimate of cluster centroids (B). tree showing how close things are to each other (C). Assignment of each point to clusters (D). Assignment of each edges of clusters Answer : (B). tree showing how close things are to each other
4. Which of the following clustering requires merging approach? (A). Partitional (B). Hierarchical (C). Naive Bayes (D). K-means Answer : (B). Hierarchical
5.Which of the following function is used for k-means clustering? (A). K-means (B). Euclidean Distance (C). Heatmap (D). Correlation Similarity Answer: (A). K-means
6.___________ was the pioneer in the field of web search with the use of PageRank for ranking Web pages with res pect to a user query. (A). Yahoo (B). YouTube (C). Facebook (D). Google Answer : (D). Google
7.Which of the following algorithm is used by Google to determine the importance of a particular page? (A). SVD (B). PageRank (C). FastMap (D). All of the above Answer : (B). PageRank
8.One of the popular techniques of Spamdexing is ___________ (A). Clocking (B). Cooking (C). Cloaking (D). Crocking
Answer: (C). Cloaking
9.Doorway pages are_________ Web pages. (A). High quality (B). Low quality (C). Informative (D). High content Answer : (B). Low quality
10.PageRank helps in measuring ________________ of a Web page within a set of similar entries. (A). Relative importance (B). Size (C). Cost (D). All of the above Answer : (A). Relative importance
11.PageRank helps in measuring ________________ of a Web page within a set of similar entries. (A). Relative importance (B). Size (C). Cost (D). All of the above Answer : (A). Relative importance
12.Web pages with Dead ends means__________ (A). Pages with no outlinks (B). Pages with no PageRank (C). Pages with no contents (D). Pages with spam Answer : (A). Pages with no outlinks
13.Topic Sensitive PageRank (TSPR). is proposed by_________ in 2003. (A). Al-Saffar (B). Bratislav V. Stojanović (C). Jianshu WENG (D). Taher H. Haveliwala Answer : (D). Taher H. Haveliwala
14.Full form of HITS is _____________ (A). High Influential Topic Search (B). High Informative Topic Search (C). Hyperlink-induced topic Search (D). None of the above Answer : (C). Hyperlink-induced topic Search
15.HITS algorithm and the PageRank algorithm both make use of the _________to decide the relevance of the page s. (A). Link structure of the Web graph (B). Design of the Web graph (C). Content of the web pages (D). All of the above Answer : (A). Link structure of the Web graph
16.When the objective is to mine social network for patterns, a natural way to represent a social network is by a____ _______
(A). Tree (B). Graph (C). Arrays (D). Lists Answer : (B). Graph
17.A social network can be considered as a___________ (A). Heterogeneous and multi relational dataset (B). LiveJournal (C). Twitter (D). DBLP Answer : (A). Heterogeneous and multi relational dataset
18.For an edge ‘e’ in a graph, ___________of ‘e’ is defined as the number of shortest paths between all node pairs ( vi vj). in the graph such that the shortest path passes through ‘e’. (A). Edge path (B). Edge measure (C). Edge closeness (D). Edge betweenness Answer : (D). Edge betweenness
19.“You may also like these…”, “People who liked this also liked….”, this type of suggestions are from the_______ _______ (A). Filtering System (B). Collaborative System (C). Recommendation System (D). Amazon System Answer: (C). Recommendation System
20.An approach to a Recommendation system is to treat this as the _______________ problem using items profiles and utility matrices. (A). MapReduce (B). Social Network (C). Machine learning (D). Unstructured Answer : (C). Machine learning
***************Data Analytics MCQs Set - 1***************
1. The branch of statistics which deals with development of particular statistical methods
is classified as
1. industry statistics
2. economic statistics
3. applied statistics
4. applied statistics
Answer: applied statistics
2. Which of the following is true about regression analysis?
1. answering yes/no questions about the data
2. estimating numerical characteristics of the data
3. modeling relationships within the data
4. describing associations within the data
Answer: modeling relationships within the data
3. Text Analytics, also referred to as Text Mining?
1. True Join:- https://t.me/AKTU_Notes_Books_Quantum
2. False
3. Can be true or False
4. Can not say
Answer: True
4. What is a hypothesis?
1. A statement that the researcher wants to test through the data collected in a study.
2. A research question the results will answer.
3. A theory that underpins the study.
4. A statistical method for calculating the extent to which the results could have happened by
chance.
Answer: A statement that the researcher wants to test through the data collected in a study.
5. What is the cyclical process of collecting and analysing data during a single research
study called?
1. Interim Analysis
2. Inter analysis
3. inter item analysis
4. constant analysis
Answer: Interim Analysis
6. The process of quantifying data is referred to as ____ Join:- https://t.me/AKTU_Notes_Books_Quantum
1. Topology
2. Digramming
3. Enumeration
4. coding
Answer: Enumeration
7. An advantage of using computer programs for qualitative data is that they _
1. Can reduce time required to analyse data (i.e., after the data are transcribed)
2. Help in storing and organising data
3. Make many procedures available that are rarely done by hand due to time constraints
4. All of the above
Answer: All of the Above
8. Boolean operators are words that are used to create logical combinations.
1. True
2. False
Answer: True
9. ______ are the basic building blocks of qualitative data.
1. Categories Join:- https://t.me/AKTU_Notes_Books_Quantum
2. Units
3. Individuals
4. None of the above
Answer: Categories
10. This is the process of transforming qualitative research data from written interviews or field notes into typed text.
1. Segmenting
2. Coding
3. Transcription
4. Mnemoning
Answer: Transcription
11. A challenge of qualitative data analysis is that it often includes data that are unwieldy and complex; it is a major challenge to make sense of the large pool of data.
1. True
2. False
Answer: True
12. Hypothesis testing and estimation are both types of descriptive statistics.
1. True
2. False Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: False
13. A set of data organised in a participants(rows)-by-variables(columns) format is known as a “data set.”
1. True
2. False
Answer: True
14. A graph that uses vertical bars to represent data is called a ___
1. Line graph
2. Bar graph
3. Scatterplot
4. Vertical graph
Answer: Bar graph
15. ____ are used when you want to visually examine the relationship between two
quantitative variables.
1. Bar graph
2. pie graph
3. line graph
4. Scatterplot Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: Scatterplot
16. The denominator (bottom) of the z-score formula is
1. The standard deviation
2. The difference between a score and the mean
3. The range
4. The mean
Answer: The standard deviation
17. Which of these distributions is used for a testing hypothesis?
1. Normal Distribution
2. Chi-Squared Distribution
3. Gamma Distribution
4. Poisson Distribution
Answer: Chi-Squared Distribution
18. A statement made about a population for testing purpose is called?
1. Statistic
2. Hypothesis
3. Level of Significance
4. Test-Statistic Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: Hypothesis
19. If the assumed hypothesis is tested for rejection considering it to be true is called?
1. Null Hypothesis
2. Statistical Hypothesis
3. Simple Hypothesis
4. Composite Hypothesis
Answer: Null Hypothesis
20. If the null hypothesis is false then which of the following is accepted?
1. Null Hypothesis
2. Positive Hypothesis
3. Negative Hypothesis
4. Alternative Hypothesis.
Answer: Alternative Hypothesis.
21. Alternative Hypothesis is also called as?
1. Composite hypothesis
2. Research Hypothesis
3. Simple Hypothesis
4. Null Hypothesis Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: Research Hypothesis
******** Join:- https://t.me/AKTU_Notes_Books_Quantum ********
*************** Data Analytics MCQs Set – 2 ***************
1. What is the minimum no. of variables/ features required to perform clustering?
1.0
2.1
3.2
4.3
Answer: 1
2. For two runs of K-Mean clustering is it expected to get same clustering results?
1. Yes
2. No
Answer: No
3. Which of the following algorithm is most sensitive to outliers? Join:- https://t.me/AKTU_Notes_Books_Quantum
1. K-means clustering algorithm
2. K-medians clustering algorithm
3. K-modes clustering algorithm
4. K-medoids clustering algorithm
Answer: K-means clustering algorithm
4. The discrete variables and continuous variables are two types of
1. Open end classification
2. Time series classification
3. Qualitative classification
4. Quantitative classification
Answer: Quantitative classification
5. Bayesian classifiers is
1. A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.
2. Any mechanism employed by a learning system to constrain the search space of a hypothesis
3. An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation.
4. None of these
Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.
6. Classification accuracy is
1. A subdivision of a set of examples into a number of classes
2. Measure of the accuracy, of the classification of a concept that is given by a certain theory
3. The task of assigning a classification to a set of examples
4. None of these
Answer: Measure of the accuracy, of the classification of a concept that is given by a certain theory
7. Euclidean distance measure is
1. A stage of the KDD process in which new data is added to the existing selection.
2. The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them
3. The distance between two points as calculated using the Pythagoras theorem
4. none of above
Answer: The distance between two points as calculated using the Pythagoras theorem
8. Hybrid is
1. Combining different types of method or information Join:- https://t.me/AKTU_Notes_Books_Quantum
2. Approach to the design of learning algorithms that is structured along the lines of the theory of evolution.
3. Decision support systems that contain an information base filled with the knowledge of an expert formulated in terms of if-then rules.
4. none of above
Answer: Combining different types of method or information
9. Decision trees use , in that they always choose the option that seems the best available at that moment.
1. Greedy Algorithms
2. divide and conquer
3. Backtracking
4. Shortest path algorithm
Answer: Greedy Algorithms
10. Discovery is
1. It is hidden within a database and can only be recovered if one is given certain clues (an example IS encrypted information).
2. The process of executing implicit previously unknown and potentially useful information from data
3. An extremely complex molecule that occurs in human chromosomes and that carries genetic
information in the form of genes.
4. None of these
Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: The process of executing implicit previously unknown and potentially useful information from data
11. Hidden knowledge referred to
1. A set of databases from different vendors, possibly using different database paradigms
2. An approach to a problem that is not guaranteed to work but performs well in most cases
3. Information that is hidden in a database and that cannot be recovered by a simple SQL query.
4. None of these
Answer: Information that is hidden in a database and that cannot be recovered by a simple SQL query.
12. Decision trees cannot handle categorical attributes with many distinct values, such as country codes for telephone numbers.
1. True
2. False
Answer: False
15. CNMICHMENT IS
1. A stage of the KDD process in which new data is added to the existing selection
2. The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them
3. The distance between two points as calculated using the Pythagoras theorem.
4. None of these Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: A stage of the KDD process in which new data is added to the existing selection
14. are easy to implement and can execute efficiently even without prior knowledge of the data, they are among the most popular algorithms for classifying text documents.
1. 1D3
2. Naive Bayes classifiers
3. CART
4. None of above
Answer: Naive Bayes classifiers
15. High entropy means that the partitions in classification are
1. Pure
2. Not Pure
3. Usefull
4. useless
Answer: Uses a single processor or computer
16. Which of the following statements about Naive Bayes is incorrect?
1. Attributes are equally important.
2. Attributes are statistically dependent of one another given the class value.
3. Attributes are statistically independent of one another given the class value. Join:- https://t.me/AKTU_Notes_Books_Quantum
4. Attributes can be nominal or numeric
Answer: Attributes are statistically dependent of one another given the class value.
17. The maximum value for entropy depends on the number of classes so if we have 8 Classes what will be the max entropy.
1. Max Entropy is 1
2. Max Entropy is 2
3. Max Entropy is 3
4. Max Entropy is 4
Answer: Max Entropy is 3
18. Point out the wrong statement.
1. k-nearest neighbor is same as k-means
2. k-means clustering is a method of vector quantization
3. k-means clustering aims to partition n observations into k clusters
4. none of the mentioned
Answer: k-nearest neighbor is same as k-means
19. Consider the following example “How we can divide set of articles such that those articles have the same theme (we do not know the theme of the articles ahead of time) ” is this:
1. Clustering
2. Classification Join:- https://t.me/AKTU_Notes_Books_Quantum
3. Regression
4. None of these
Answer: Clustering
20. Can we use K Mean Clustering to identify the objects in video?
1. Yes
2. No
Answer: Yes
21. Clustering techniques are in the sense that the data scientist does not determine, in advance, the labels to apply to the clusters.
1. Unsupervised
2. supervised
3. Reinforcement
4, Neural network
Answer: Unsupervised
22. metric is examined to determine a reasonably optimal value of k.
1. Mean Square Error
2. Within Sum of Squares (WSS)
3. Speed Join:- https://t.me/AKTU_Notes_Books_Quantum
4. None of these
Answer: Within Sum of Squares (WSS)
23. If an itemset is considered frequent, then any subset of the frequent itemset must also be frequent.
1. Apriori Property
2. Downward Closure Property
3. Either 1 or 2
4. Both 1 and 2
Answer: Both 1 and 2Z
24. if {bread,eggs,milk} has a support of 0.15 and {bread,eggs} also has a support of 0.15, the confidence of rule {bread,eggs} = {milk} is
1.0
2.1
3.2
4.3
Answer: 1
25. Confidence is a measure of how X and Y are really related rather than coincidentally happeningtogether.
1. True Join:- https://t.me/AKTU_Notes_Books_Quantum
2. False
Answer: False
26. recommend items based on similarity measures between users and/or items.
1. Content Based Systems
2. Hybrid System
3. Collaborative Filtering Systems
4. None of these
Answer: Collaborative Filtering Systems
27. There are major Classification of Collaborative Filtering Mechanisms
1.1
2.2
3.3
4. none of above
Answer: 2
28. Movie Recommendation to people is an example of
1. User Based Recommendation
2. Item Based Recommendation
3. Knowledge Based Recommendation Join:- https://t.me/AKTU_Notes_Books_Quantum
4. content based recommendation
Answer: Item Based Recommendation
29. recommenders rely on an explicitely defined set of recommendation rules
1. Constraint Based
2. Case Based
3. Content Based
4. User Based
Answer: Case Based
30. Parallelized hybrid recommender systems operate dependently of one another and produce separate recommendation lists.
1. True
2. False
Answer: False
Data Analytics
Unit 1:
1. Data Analysis is a process of?
A. inspecting data B. cleaning data C. transforming data D. All of the above
2. Which of the following is not a major data analysis approaches?
A. Data Mining B. Predictive Intelligence C. Business Intelligence D. Text Analytics
3. How many main statistical methodologies are used in data analysis?
A. 2 B. 3 C. 4 D. 5
4. In descriptive statistics, data from the entire population or a sample is summarized with ?
A. integer descriptors B. floating descriptors C. numerical descriptors D. decimal descriptors View Answer
5. Data Analysis is defined by the statistician?
A. William S. B. Hans Peter Luhn C. Gregory Piatetsky-Shapiro D. John Tukey
6. Which of the following is true about hypothesis testing?
A. answering yes/no questions about the data B. estimating numerical characteristics of the data C. describing associations within the data D. modeling relationships within the data
7. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.
A. TRUE B. FALSE C. Can be true or false D. Can not say
8. The branch of statistics which deals with development of particular statistical methods is classified as
A. industry statistics B. economic statistics C. applied statistics D. applied statistics
9. Which of the following is true about regression analysis?
A. answering yes/no questions about the data B. estimating numerical characteristics of the data C. modeling relationships within the data D. describing associations within the data
10. Text Analytics, also referred to as Text Mining?
A. TRUE B. FALSE C. Can be true or false D. Can not say
11. In an Internet context, this is the practice of tailoring Web pages to individual users’ characteristics or preferences. 1. Web services 2. customer-facing 3. client/server 4. personalization
12. This is the processing of data about customers and their relationship with the enterprise in order to improve the enterprise’s future sales and service and lower cost. 1. clickstream analysis 2. database marketing 3. customer relationship management 4. CRM analytics
13. This is a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions. 1. best practice 2. data mart
3. business information warehouse 4. business intelligence
14. This is a systematic approach to the gathering, consolidation, and processing of consumer data (both for customers and potential customers) that is maintained in a company’s databases 1. database marketing 2. marketing encyclopedia 3. application integration 4. service oriented integration
15. This is an arrangement in which a company outsources some or all of its customer relationship management functions to an application service provider (ASP). 1. spend management 2. supplier relationship management 3. hosted CRM 4. Customer Information Control System
16. What are the five V’s of Big Data? 1. Volume 2. velocity 3. Variety 4. All of the above
m
17. ____ hides the limitations of Java behind a powerful and concise Clojure API for Cascading.” 1. Scalding 2. Cascalog 3. Hcatalog 4. Hcalding
18. What are the main components of Big Data? 1. MapReduce 2. HDFS 3. YARN
4. All of these
19. What are the different features of Big Data Analytics? 1. Open-Source 2. Scalability 3. Data Recovery 4. All the above
20. Define the Port Numbers for NameNode, Task Tracker and Job Tracker 1. NameNode 2. Task Tracker 3. Job Tracker 4. All of the above
21. Facebook Tackles Big Data With ____ based on Hadoop 1. Project Prism 2. Prism 3. ProjectData 4. ProjectBid
22. Which of the following is not a phase of Data Analytics Life Cycle? 1. Communication 2. Recall 3. Data Preparation 4. Model Planning
UNIT 2: DATA ANALYSIS
1. In regression, the equation that describes how the response variable (y) is related to the explanatory variable (x) is: a. the correlation model b. the regression model c. used to compute the correlation coefficient d. None of these alternatives is correct.
2. The relationship between number of beers consumed (x) and blood alcohol content (y) was s tu died in 16 male college students by using least squares regression. The following regression equation was obtained from this study: != -0.0127 + 0.0180x The above equation implies that:
a. each beer consumed increases blood alcohol by 1.27%
b. on average it takes 1.8 beers to increase blood alcohol content by 1%
c. each beer consumed increases blood alcohol by an average of amount of 1.8%
d. each beer consumed increases blood alcohol by exactly 0.018
3 . SSE can never be
a. larger than SST
b. smaller than SST
c. equal to 1
d. equal to zero
4. Regression modeling is a statistical framework for developing a mathematical equation that describes how
a. one explanatory and one or more response variables are related
b. several explanatory and several response variables response are related
c. one response and one or more explanatory variables are related
d. All of these are correct.
5. In regression analysis, the variable that is being predicted is the
a. response, or dependent, variable
b. independent variable
c. intervening variable
d. is usually x
6 Regression analysis was applied to return rates of sparrowhawk colonies. Regression analysis was used to study the relationship between return rate (x: % of birds that return to the colony in a given year) and immigration rate (y: % of new adults that join the colony per year). The following regression equation
was obtained. ! = 31.9 – 0.34x Based on the above estimated regression equation, if the return rate were to decrease by 10% the rate of immigration to the colony would:
a. increase by 34%
b. increase by 3.4%
c. decrease by 0.34%
d. decrease by 3.4%
7. In least squares regression, which of the following is not a required assumption about the error term ε?
a. The expected value of the error term is one.
b. The variance of the error term is the same for all values of x.
c. The values of the error term are independent.
d. The error term is normally distributed.
8. Larger values of r 2 (R2 ) imply that the observations are more closely grouped about the
a. average value of the independent variables
b. average value of the dependent variable
c. least squares line
d. origin
9. In a regression analysis if r 2 = 1, then
a. SSE must also be equal to one
b. SSE must be equal to zero
c. SSE can be any positive value
d. SSE must be negative
10.Which type of multivariate analysis should be used when a researcher wants to reduce a Set of variables to a smaller set of composite variables by identifying underlying dimensions of the data?
A)Conjoint analysis
B)Cluster analysis
C)Multiple regression analysis
D)Factor analysis
11. Which type of multivariate analysis should be used when a researcher wants to estimate The utility that consumers associate with different product features?
A)Conjoint analysis
B)Cluster analysis\ A
C)Multiple regression analysis
D)Factor analysis
12. Which type of multivariate analysis should be used when a researcher wants to identify Subgroups of individuals that are homogeneous within subgroups and different from other subgroups?
A)Conjoint analysis
B)Cluster analysis
C)Multiple regression analysis
D)Factor analysis
13. Which type of multivariate analysis should be used when a researcher wants predict Group membership on the basis of two or more independent variables?
A)Conjoint analysis
B)Cluster analysis
C)Multiple regression analysis
D)Multiple discriminant analysis
14. Support vector machine (SVM) is a _________ classifier? Discriminative
Generative
15. SVM can be used to solve ___________ problems. Classification
Regression
Clustering
Both Classification and Regression
16. SVM is a ___________ learning algorithm Supervised
Unsupervised
17. SVM is termed as ________ classifier Minimum margin
Maximum margin
18. The training examples closest to the separating hyperplane are called as _______ Training vectors
Test vectors
19. A factor analysis is…, while a principal components analysis is…
A broad term, the most commonly used technique for doing factor analysis.
B The most commonly used technique for doing factor analysis, a broad term.
C Both of the above.
D NONE OF THE ABOVE
20. Dimension Reduction is defined as-
• A It is a process of converting a data set having vast dimensions into a data set with lesser dimensions. • B It ensures that the converted data set conveys similar information concisely. C ALL OF ABOVE
D NONE OF THE ABOVE
21.. What is the form of Fuzzy logic? a) Two-valued logic b) Crisp set logic c) Many-valued logic d) Binary set logic
22. Traditional set theory is also known as Crisp Set theory. a) True b) False
23. The truth values of traditional set theory is ____________ and that of fuzzy set is __________ a) Either 0 or 1, between 0 & 1 b) Between 0 & 1, either 0 or 1 c) Between 0 & 1, between 0 & 1 d) Either 0 or 1, either 0 or 1
24. Fuzzy logic is extension of Crisp set with an extension of handling the concept of Partial Truth. a) True b) False
25. The room temperature is hot. Here the hot (use of linguistic variable is used) can be represented by _______ a) Fuzzy Set b) Crisp Set c) Fuzzy & Crisp Set
d) None of the mentioned
26. The values of the set membership is represented by ___________ a) Discrete Set b) Degree of truth c) Probabilities d) Both Degree of truth & Probabilities
27. Japanese were the first to utilize fuzzy logic practically on high-speed trains in Sendai. a) True b) False
28. Fuzzy Set theory defines fuzzy operators. Choose the fuzzy operators from the following. a) AND b) OR c) NOT d) All of the mentioned
29. There are also other operators, more linguistic in nature, called __________ that can be applied to fuzzy set theory. a) Hedges b) Lingual Variable c) Fuzz Variable d) None of the mentioned
30. Fuzzy logic is usually represented as ___________ a) IF-THEN-ELSE rules b) IF-THEN rules c) Both IF-THEN-ELSE rules & IF-THEN rules d) None of the mentioned
31. Like relational databases there does exists fuzzy relational databases. a) True b) False
32. ______________ is/are the way/s to represent uncertainty. a) Fuzzy Logic b) Probability c) Entropy d) All of the mentioned
33. ____________ are algorithms that learn from their more complex environments (hence eco) to generalize, approximate and simplify solution logic. a) Fuzzy Relational DB b) Ecorithms c) Fuzzy Set d) None of the mentioned
Unit 3:
1 : What do you mean by sampling of stream data?
1. Sampling reduces the amount of data fed to a subsequent data mining algorithm. 2. Sampling reduces the diversity of the data stream 3. Sampling aims to keep statistical properties of the data intact. 4. Sampling algorithms often doesn't need multiple passes over the data
Question 2 : if Distance measure d(x, y)= d(y, x) then it is called
1. Symmetric 2. identical 3. positiveness 4. triangle inequality
Question 3 : NOSQL is
1. Not only SQL 2. Not SQL 3. Not Over SQL 4. No SQL
Question 4 : Find the L1 and L2 distances between the points (5, 6, 7) and (8, 2, 4).
1. L1 =10 , L2 = 5.83 2. L1 =10 , L2 = 5 3. L1 =11 , L2 = 4.9
4. L1 =9 , L2 = 5.83
Question 5 : The time between elements of one stream
1. need not be uniform 2. need to be uniform 3. must be 1ms. 4. must be 1ns
Question 6 : A Reduce task receives
1. one or more keys and their associated value list 2. key value pair 3. list of keys and their associated values 4. list of key value pairs
Question 7 : Which of the following statements about data streaming is true?
1. Stream data is always unstructured data. 2. Stream data often has a high velocity. 3. Stream elements cannot be stored on disk. 4. Stream data is always structured data.
Question 8 : Hadoop is the solution for:
1. Database software 2. Big Data Software 3. Data Mining software 4. Distribution software
Question 9 : ETL stands for ________________
1. Extraction transformation and loading 2. Extract Taken Lend 3. Enterprise Transfer Load 4. Entertainment Transference Load
Question 10 : “Sharding” a database across many server instances can be achieved with _______________
1. MAN 2. LAN 3. WAN 4. SAN
Question 11 : Neo4j is an example of which of the following NoSQL architectural pattern?
1. Key-value store 2. Graph Store 3. Document Store 4. Column-based Store
Question 12 : CSV and JSON can be described as
1. Structured data 2. Unstructured data 3. Semi-structured data 4. Multi-structured data
Question 13 : The hardware term used to describe Hadoop hardware requirements is
1. Commodity firmware
2. Commodity software 3. Commodity hardware 4. Cluster hardware
Question 14 : Which of the following is not a Hadoop Distributions?
1. MAPR 2. Cloudera 3. Hortonworks 4. RMAP
Question 15 : Which of the following Operation can be implemented with Combiners?
1. Selection 2. Projection 3. Natural Join 4. Union
Question 16 : ________ stores are used to store information about networks, such as social connections.
1. Key-value 2. Wide-column 3. Document 4. graph
Question 17 : The DGIM algorithm was developed to estimate the counts of 1's occur within the last k bits of a stream window N. Which of the following statements is true about the estimate of the number of 0's based on DGIM?
1. The number of 0's cannot be estimated at all. 2. The number of 0's can be estimated with a maximum guaranteed error
3. To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's. 4. Determine whether an element has already occurred in previous stream data.
Question 18 : If size of file is 4 GB and block size is 64 MB then number of mappers required for MapReduce task is
1. 8 2. 16 3. 32 4. 64
Question 19 : Which of the following is not the default daemon of Hadoop?
1. Namenode 2. Datanode 3. Job Tracker 4. Job history server
Question 20 : In Bloom filter an array of n bits is initialized with
1. all 0s 2. all 1s 3. half 0s and half 1s 4. all -1
Question 21 : _____________is a batch-based, distributed computing framework modeled after Google’s paper.
1. MapCompute 2. MapReuse 3. MapCluster
4. MapReduce
Question 22 : What is the edit distance between A=father and B=feather ?
1. 5 2. 1 3. 4 4. 2
Question 23 : Sliding window operations typically fall in the category
1. OLTP Transactions 2. Big Data Batch Processing 3. Big Data Real Time Processing 4. Small Batch Processing
Question 24 : _________ systems focus on the relationship between users and items for recommendation.
1. DGIM 2. Collaborative-Filtering 3. Content Based and Collaborative Filtering 4. Content Based
Question 25 : Find Hamming Distance for vectors A=100101011 B=100010010
1. 2 2. 4 3. 3 4. 1
Question 26 : During start up, the ___________ loads the file system state from the fsimage and the edits log file.
1. Datanode 2. Namenode 3. Secondary Namenode 4. Rack awereness policy
Question 27 : What is the finally produced by Hierarchical Agglomerative Clustering?
1. final estimate of cluster centroids 2. assignment of each point to clusters 3. tree showing how close things are to each other 4. Group of clusters
Question 28 : The Jaccard similarity of two non-binary sets A and B, is defined by__________
1. Jaccard Index 2. Primary Index 3. Secondary Index 4. Clustered Index
Question 29 : Following is based on grid like street geography of the New York:
1. Manhattan Distance 2. Edit Distance 3. Hamming distance 4. Lp distance
Question 30 : The FM-sketch algorithm can be used to:
1. Estimate the number of distinct elements. 2. Sample data with a time-sensitive window. 3. Estimate the frequent elements. 4. Determine whether an element has already occurred in previous stream data.
31 : Pick a hash function h that maps each of the N elements to at least log2 N bits, Estimated number of distinct elements is
1. 2^R 2. 2^(-R) 3. 1-(2^R) 4. 1-(2^(-R))
Question 32 : which of the following is not the characterstic of stream data?
1. Continuous 2. ordered 3. persistant 4. huge
Question 33 : Which of the following is a column-oriented database that runs on top of HDFS
1. Hive 2. Sqoop 3. Hbase 4. Flume
Question 34 : Which of the following decides the number of partitions that are created on the local file system of the worker nodes?
1. Number of map tasks 2. Number of reduce tasks 3. Number of file input splits 4. Number of distinct keys in the intermediate key-value pairs
Question 35 : Which of the following is not the class of points in BFR algorithm
1. Discard Set (DS) 2. Compression Set (CS) 3. Isolation Set (IS) 4. Retained Set (RS)
Question 36 : Which of the following is not true for 5v?
1. Volume 2. variable 3. Velocity 4. value
Question 37 : Which algorithm isused to find fully connected subgraph in soial media mining?
1. CURE 2. CPM 3. SimRank 4. Girvan-Newman Algorithm
Question 38 : A ________________ query Q is a query that is issued once over a database D, and then logically runs continuously over the data in D until Q is terminated.
1. One-time Query 2. Standing Query 3. Adhoc Query
4. General Query
Question 39 : Effect of Spider trap on page rank
1. perticular page get the highest page rank 2. All the pages of web will get 0 page rank 3. no effect on any page 4. affects a perticular set of pages
Question 40 : Which of the following is correct option for MongoDB
1. MongoDB is column oriented data store 2. MongoDB uses XML more in comparison with JSON 3. MongoDB is a document store database 4. MongoDB is a key-value data store
Question 41 : _________ systems focus on the relationship between users and items for recommendation.
1. DGIM 2. Collaborative-Filtering 3. Content Based and Collaborative Filtering 4. Content Based
Question 42 : The graphical representation of an SNA is made up of links and _____________.
1. People 2. Networks 3. Nodes 4. Computers
Question 43 : Hadoop is a framework that works with a variety of related tools. Common hadoop ecosystem include ____________
1. MapReduce, Hummer and Iguana 2. MapReduce, Hive and HBase 3. MapReduce, MySQL and Google Apps 4. MapReduce, Heron and Trumpet
Question 44 : About data streaming, Which of the following statements is true?
1. Stream data is always unstructured data. 2. Stream data often has a high velocity. 3. Stream elements cannot be stored on disk. 4. Stream data is always structured data.
Question 45 : Which of the following is a NoSQL Database Type ?
1. SQL 2. JSON 3. Document databases 4. CSV
Question 46 : Techniques for fooling search engines into believing your page is about something it is not, are called _____________.
1. term spam 2. page rank 3. phishing 4. dead ends
Question 47 : The police set up checkpoints at randomly selected road locations, then inspected every driver at those locations. What type of sample is this?
1. Simple Random Sample 2. Startified Random Sample 3. Cluster Random Sample 4. Uniform sampling
Question 48 : Which of the following statements about standard Bloom filters is correct?
1. It is possible to delete an element from a Bloom filter. 2. A Bloom filter always returns the correct result. 3. It is possible to alter the hash functions of a full Bloom filter to create more space. 4. A Bloom filter always returns TRUE when testing for a previously added element.
Question 49 : Which of the following is responsible for managing the cluster resources and use them for scheduling users’ applications?
1. Hadoop Common 2. YARN 3. HDFS 4. MapReduce
Question 50 : ___________is related with an inconsistency possessed by data and this in turn hampers the data analization process or creates hurdle in the way for those wish to analyze this form of data.
1. Variability 2. Variety 3. Volume 4. Complexity
Unit 4:
Question 1 This clustering algorithm terminates when mean values computed for the current iteration of the algorithm are identical to the computed mean values for the previous iteration Select one:
a. K-Means clustering
b. conceptual clustering
c. expectation maximization
d. agglomerative clustering Show Answer
Question 2 This clustering approach initially assumes that each data instance represents a single cluster. Select one:
a. expectation maximization
b. K-Means clustering
c. agglomerative clustering
d. conceptual clustering Show Answer
Question 3 The correlation coefficient for two real-valued attributes is – 0.85. What does this value tell you? Select one:
a. The attributes are not linearly related.
b. As the value of one attribute decreases the value of the second attribute increases.
c. As the value of one attribute increases the value of the second attribute also increases.
d. The attributes show a linear relationship Show Answer
Question 4 Time Complexity of k-means is given by Select one:
a. O(mn)
b. O(tkn)
c. O(kn)
d. O(t2kn) Show Answer
Question 5 Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that Select one:
a. Y is false when X is known to be false.
b. Y is true when X is known to be true.
c. X is true when Y is known to be true
d. X is false when Y is known to be false.
Question 6 Chameleon is Select one:
a. Density based clustering algorithm
b. Partitioning based algorithm
c. Model based algorithm
d. Hierarchical clustering algorithm
Question 7 In _________ clusterings, points may belong to multiple clusters Select one:
a. Non exclusivce
b. Partial
c. Fuzzy
d. Exclusive Show Answer
Question 8 Find odd man out Select one:
a. DBSCAN
b. K mean
c. PAM
d. K medoid
Question 9 Which statement is true about the K-Means algorithm? Select one:
a. The output attribute must be cateogrical.
b. All attribute values must be categorical.
c. All attributes must be numeric
d. Attribute values may be either categorical or numeric
Question 10 This data transformation technique works well when minimum and maximum values for a real-valued attribute are known. Select one:
a. z-score normalization
b. min-max normalization
c. logarithmic normalization
d. decimal scaling
Question 11 The number of iterations in apriori ___________ Select one:
a. increases with the size of the data
b. decreases with the increase in size of the data
c. increases with the size of the maximum frequent set
d. decreases with increase in size of the maximum frequent set Show Answer
Question 12 Which of the following are interestingness measures for association rules? Select one:
a. recall
b. lift
c. accuracy
d. compactness Show Answer
Question 13 Which one of the following is not a major strength of the neural network approach? Select one
: a. Neural network learning algorithms are guaranteed to converge to an optimal solution
b. Neural networks work well with datasets containing noisy data.
c. Neural networks can be used for both supervised learning and unsupervised clustering
d. Neural networks can be used for applications that require a time element to be included in the data Show Answer
Question 14 Find odd man out Select one:
a. K medoid
b. K mean
c. DBSCAN
d. PAM
Question 15 Given a frequent itemset L, If |L| = k, then there are Select one:
a. 2k – 1 candidate association rules
b. 2k candidate association rules
c. 2k – 2 candidate association rules
d. 2k -2 candidate association rules Show Answer
Question 16 . _________ is an example for case based-learning Select one:
a. Decision trees
b. Neural networks
c. Genetic algorithm
d. K-nearest neighbor Show Answer
Question 17 The average positive difference between computed and desired outcome values. Select one:
a. mean positive error
b. mean squared error
c. mean absolute error
d. root mean squared error Show Answer
Question 18 Frequent item sets is Select one:
a. Superset of only closed frequent item sets
b. Superset of only maximal frequent item sets
c. Subset of maximal frequent item sets
d. Superset of both closed frequent item sets and maximal frequent item sets Show Answer
Question 19 1. Assume that we have a dataset containing information about 200 individuals. A supervised data mining session has discovered the following rule: IF age < 30 & credit card insurance = yes THEN life insurance = yes Rule Accuracy: 70% and Rule Coverage: 63% How many individuals in the class life insurance= no have credit card insurance and are less than 30 years old? Select one:
a. 63
b. 30
c. 38
d. 70 Show Answer
Question 20 Use the three-class confusion matrix below to answer percent of the instances were correctly classified? Computed Decision Class 1 Class 2 Class 3 Class 1 10 5 3 Class 2 5 15 3 Class 3 2 2 5 Select one:
a. 60
b. 40
c. 50
d. 30 Show Answer
Question 21 Which of the following is cluster analysis? Select one:
a. Simple segmentation
b. Grouping similar objects
c. Labeled classification
d. Query results grouping Show Answer
Question 22 A good clustering method will produce high quality clusters with Select one:
a. high inter class similarity
b. low intra class similarity
c. high intra class similarity
d. no inter class similarity Show Answer
Question 23 Which two parameters are needed for DBSCAN Select one:
a. Min threshold
b. Min points and eps
c. Min sup and min confidence
d. Number of centroids Show Answer
Question 24 Which statement is true about neural network and linear regression models? Select one:
a. Both techniques build models whose output is determined by a linear sum of weighted input attribute values.
b. The output of both models is a categorical attribute value.
c. Both models require numeric attributes to range between 0 and 1.
d. Both models require input attributes to be numeric. Show Answer
Question 25 In Apriori algorithm, if 1 item-sets are 100, then the number of candidate 2 item-sets are Select one:
a. 100
b. 4950
c. 200
d. 5000
Show Answer
Question 26 Significant Bottleneck in the Apriori algorithm is Select one:
a. Finding frequent itemsets
b. Pruning
c. Candidate generation
d. Number of iterations Show Answer
Question 27 The concept of core, border and noise points fall into this category? Select one:
a. DENCLUE
b. Subspace clustering
c. Grid based
d. DBSCAN Show Answer
Question 28 The correlation coefficient for two real-valued attributes is –0.85. What does this value tell you? Select one:
a. The attributes show a linear relationship
b. The attributes are not linearly related.
c. As the value of one attribute increases the value of the second attribute also increases.
d. As the value of one attribute decreases the value of the second attribute increases. Show Answer
Question 29 Machine learning techniques differ from statistical techniques in that machine learning methods Select one:
a. are better able to deal with missing and noisy data
b. typically assume an underlying distribution for the data
c. have trouble with large-sized datasets
d. are not able to explain their behavior. Show Answer
Question 30 The probability of a hypothesis before the presentation of evidence. Select one:
a. a priori
b. posterior
c. conditional
d. subjective Show Answer
Question 31 KDD represents extraction of Select one:
a. data
b. knowledge
c. rules
d. model Show Answer
Question 32 Which statement about outliers is true? Select one
: a. Outliers should be part of the training dataset but should not be present in the test data.
b. Outliers should be identified and removed from a dataset
. c. The nature of the problem determines how outliers are used
d. Outliers should be part of the test dataset but should not be present in the training data. Show Answer
Question 33 The most general form of distance is Select one:
a. Manhattan
b. Eucledian
c. Mean
d. Minkowski Show Answer
Question 34 Arbitrary shaped clusters can be found by using Select one:
a. Density methods
b. Partitional methods
c. Hierarchical methods
d. Agglomerative Show Answer
Question 35 Which Association Rule would you prefer Select one
: a. High support and medium confidence
b. High support and low confidence
c. Low support and high confidence
d. Low support and low confidence Show Answer
Question 36 With Bayes theorem the probability of hypothesis H¾ specified by P(H) ¾ is referred to as Select one:
a. a conditional probability
b. an a priori probability
c. a bidirectional probability
d. a posterior probability Show Answer
Question 37 In a Rule based classifier, If there is a rule for each combination of attribute values, what do you called that rule set R Select one:
a. Exhaustive
b. Inclusive
c. Comprehensive
d. Mutually exclusive Show Answer
Question 38 The apriori property means Select one
: a. If a set cannot pass a test, its supersets will also fail the same test
b. To decrease the efficiency, do level-wise generation of frequent item sets
c. To improve the efficiency, do level-wise generation of frequent item sets
d. If a set can pass a test, its supersets will fail the same test Show Answer
Question 39 If an item set ‘XYZ’ is a frequent item set, then all subsets of that frequent item set are Select one:
a. Undefined
b. Not frequent
c. Frequent
d. Can not say Show Answer
Question 40 Clustering is ___________ and is example of ____________learning Select one:
a. Predictive and supervised
b. Predictive and unsupervised
c. Descriptive and supervised
d. Descriptive and unsupervised Show Answer
Question 41 The probability that a person owns a sports car given that they subscribe to automotive magazine is 40%. We also know that 3% of the adult population subscribes to automotive magazine. The probability of a person owning a sports car given that they don̢۪t subscribe to automotive magazine is 30%. Use this information to compute the probability that a person subscribes to automotive magazine given that they own a sports car Select one
: a. 0.0368
b. 0.0396
c. 0.0389
d. 0.0398 Show Answer
Question 42 Simple regression assumes a __________ relationship between the input attribute and output attribute. Select one:
a. quadratic
b. inverse
c. linear
d. reciprocal Show Answer
Question 43 Which of the following algorithm comes under the classification Select one:
a. Apriori
b. Brute force
c. DBSCAN
d. K-nearest neighbor Show Answer
Question 44 Hierarchical agglomerative clustering is typically visualized as? Select one:
a. Dendrogram
b. Binary trees
c. Block diagram
d. Graph Show Answer Question
45 The _______ step eliminates the extensions of (k-1)-itemsets which are not found to be frequent,from being considered for counting support Select one:
a. Partitioning
b. Candidate generation
c. Itemset eliminations
d. Pruning Show Answer
Question 46 To determine association rules from frequent item sets Select one:
a. Only minimum confidence needed
b. Neither support not confidence needed
c. Both minimum support and confidence are needed
d. Minimum support is needed Show Answer
Question 47 What is the final resultant cluster size in Divisive algorithm, which is one of the hierarchical clustering approaches? Select one:
a. Zero
b. Three
c. singleton
d. Two Show Answer
Question 48 If {A,B,C,D} is a frequent itemset, candidate rules which is not possible is Select one:
a. C –> A
b. D –>ABCD
c. A –> BC
d. B –> ADC Show Answer
Question 49 Which Association Rule would you prefer Select one:
a. High support and low confidence
b. Low support and high confidence
c. Low support and low confidence
d. High support and medium confidence Show Answer
Question 50 The probability that a person owns a sports car given that they subscribe to automotive magazine is 40%. We also know that 3% of the adult population subscribes to automotive magazine. The probability of a person owning a sports car given that they don’t subscribe to automotive magazine is 30%. Use this information to compute the probability that a person subscribes to automotive magazine given that they own a sports car
Select one:
a. 0.0398
b. 0.0389
c. 0.0368
d. 0.0396 Show Answer
Unit 5:
1. What is true about Data Visualization?
A. Data Visualization is used to communicate information clearly and efficiently to users by the usage of information graphics such as tables and charts. B. Data Visualization helps users in analyzing a large amount of data in a simpler way. C. Data Visualization makes complex data more accessible, understandable, and usable. D. All of the above
2. Data can be visualized using?
A. graphs B. charts C. maps D. All of the above
3. Data visualization is also an element of the broader _____________.
A. deliver presentation architecture B. data presentation architecture C. dataset presentation architecture D. data process architecture
4. Which method shows hierarchical data in a nested format?
A. Treemaps B. Scatter plots C. Population pyramids D. Area charts
5. Which is used to inference for 1 proportion using normal approx?
A. fisher.test() B. chisq.test() C. Lm.test() D. prop.test()
6. Which is used to find the factor congruence coefficients?
A. factor.mosaicplot B. factor.xyplot C. factor.congruence D. factor.cumsum
7. Which of the following is tool for checking normality?
A. qqline() B. qline() C. anova() D. lm()
8. Which of the following is false?
A. data visualization include the ability to absorb information quickly B. Data visualization is another form of visual art C. Data visualization decrease the insights and take solwer decisions D. None Of the above
9. Common use cases for data visualization include?
A. Politics B. Sales and marketing C. Healthcare D. All of the above
10. Which of the following plots are often used for checking randomness in time series?
A. Autocausation B. Autorank C. Autocorrelation D. None of the above
11. Which are pros of data visualization?
A. It can be accessed quickly by a wider audience. B. It can misrepresent information C. It can be distracting D. None Of the above
12. Which are cons of data visualization?
A. It conveys a lot of information in a small space. B. It makes your report more visually appealing.
C. visual data is distorted or excessively used. D. None Of the above
13. Which of the intricate techniques is not used for data visualization?
A. Bullet Graphs B. Bubble Clouds C. Fever Maps D. Heat Maps
14. Which one of the following is most basic and commonly used techniques?
A. Line charts B. Scatter plots C. Population pyramids D. Area charts
15. Which is used to query and edit graphical settings?
A. anova() B. par() C. plot() D. cum()
16. Which of the following method make vector of repeated values?
A. rep() B. data() C. view() D. read()
17. Who calls the lower level functions lm.fit?
A. lm() B. col.max
C. par D. histo
18. Which of the following lists names of variables in a data.frame?
A. par() B. names() C. barchart() D. quantile()
19. Which of the folllowing statement is true?
A. Scientific visualization, sometimes referred to in shorthand as SciVis B. Healthcare professionals frequently use choropleth maps to visualize important health data. C. Candlestick charts are used as trading tools and help finance professionals analyze price movements over time D. All of the above
20. ________is used for density plots?
A. par B. lm C. kde D. C
Answer key:
Unit :1
1
Ans : D
Explanation: Data Analysis is a process of inspecting, cleaning, transforming and modelling data with the goal of discovering useful information, suggesting conclusions and supporting decision-making.
2. Ans : B
Explanation: Predictive Analytics is major data analysis approaches not Predictive Intelligence.
3. Ans : A
Explanation: In data analysis, two main statistical methodologies are used Descriptive statistics and Inferential statistics.
4. Ans : C
Explanation: In descriptive statistics, data from the entire population or a sample is summarized with numerical descriptors.
5. Ans : D
Explanation: Data Analysis is defined by the statistician John Tukey in 1961 as "Procedures for analyzing data.
6. Ans : A
Explanation: answering yes/no questions about the data (hypothesis testing)
7. Ans : A
Explanation: The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.
8. Ans : D
Explanation: The branch of statistics which deals with development of particular statistical methods is classified as applied statistics.
9.
Ans : C
Explanation: modeling relationships within the data (E.g. regression analysis).
10 Ans : A
Explanation: Text Data Mining is the process of deriving high-quality information from text.
11 personalization
12.
CRM analytics
13.
Advertisement
business intelligence
14. database marketing
15. hosted CRM
16. All of the above
17. Cascalog
18. All of these
19. All the above
20. All of the above
21.
Project Prism
22.
Recall
UNIT 2:
1. b
2. c
3. A
4. c
5. a
6. b
7. a
8. c
9. B
10. D
11. A
12. B
13. D
14.A
15. D
16. A
17. B
18. C
19. A broad term, the most commonly used technique for doing factor analysis.
20. C
21. Answer: c Explanation: With fuzzy logic set membership is defined by certain value. Hence it could have many values to be in the set.
22. Answer: a Explanation: Traditional set theory set membership is fixed or exact either the member is in the set or not. There is only two crisp values true or false. In case of fuzzy logic there are many values. With weight say x the member is in the set. 23. Answer: a Explanation: Refer the definition of Fuzzy set and Crisp set. 24. Answer: a Explanation: None. 25. Answer: a Explanation: Fuzzy logic deals with linguistic variables. 26. Answer: b Explanation: Both Probabilities and degree of truth ranges between 0 – 1. 27. Answer: a Explanation: None. 28. Answer: d Explanation: The AND, OR, and NOT operators of Boolean logic exist in fuzzy logic, usually defined as the minimum, maximum, and complement; 29. Answer: a Explanation: None.
30. Answer: b Explanation: Fuzzy set theory defines fuzzy operators on fuzzy sets. The problem in applying this is that the appropriate fuzzy operator may not be known. For this reason, fuzzy logic usually uses IF-THEN rules, or constructs that are equivalent, such as fuzzy associative matrices. Rules are usually expressed in the form: IF variable IS property THEN action 31. Answer: a Explanation: Once fuzzy relations are defined, it is possible to develop fuzzy relational databases. The first fuzzy relational database, FRDB, appeared in Maria Zemankova dissertation. 32. Answer: d Explanation: Entropy is amount of uncertainty involved in data. Represented by H(data). 33. Answer: c Explanation: Local structure is usually associated with linear rather than exponential growth in complexity.
Unit 4:
1. Feedback: K-Means clustering 2. Feedback: 3. As the value of one attribute decreases the value of the second attribute increases. 4. O(tkn) 5. Y is true when X is known to be true 6. Hierarchical clustering algorithm
7. Fuzzy 8. dbscan 9. All attributes must be numeric 10. : min-max normalization 11. increases with the size of the maximum frequent set 12. : lift 13. Neural network learning algorithms are guaranteed to converge to an optimal solution 14. DBSCAN 15. 2k -2 candidate association rules 16. : K-nearest neighbor 17. mean absolute error 18. Superset of both closed frequent item sets and maximal frequent item sets 19. 38 20. 60 21. Grouping similar objects 22. high intra class similarity 23. Min points and eps 24. Both models require input attributes to be numeric. 25. 4950 26. Candidate generation 27. DBSCAN 28. As the value of one attribute decreases the value of the second attribute increases. 29. are better able to deal with missing and noisy data 30. a priori 31. knowledge 32. The nature of the problem determines how outliers are used 33. Minkowski 34. Density methods 35. Low support and high confidence 36. an a priori probability 37. Exhaustive 38. If a set cannot pass a test, its supersets will also fail the same test 39. Frequent 40. Descriptive and unsupervised 41. 0.0396 42. linear 43. K-nearest neighbor 44. Dendrogram 45. Pruning 46. Both minimum support and confidence are needed 47. singleton 48. D –>ABCD 49. Low support and high confidence 50. Answer Feedback: 0.0396
Unit 5:
1. Ans : D
Explanation: Data Visualization is used to communicate information clearly and efficiently to users by the usage of information graphics such as tables and charts. It helps users in analyzing a large amount of data in a simpler way. It makes complex data more accessible, understandable, and usable.
2.
Ans : D
Explanation: Data visualization is a graphical representation of quantitative information and data by using visual elements like graphs, charts, and maps.
3. Ans : B
Explanation: Data visualization is also an element of the broader data presentation architecture (DPA) discipline, which aims to identify, locate, manipulate, format and deliver data in the most efficient way possible.
4.
Ans : A
Explanation: Treemaps are best used when multiple categories are present, and the goal is to compare different parts of a whole.
5
Ans : D
Explanation: prop.test() is used to inference for 1 proportion using normal approx.
6. Ans : C
Explanation: factor.congruence is used to find the factor congruence coefficients.
7. Ans : A
Explanation: qqnorm is another tool for checking normality.
8. Ans : C
Explanation: Data visualization decrease the insights andtake solwer decisions is false statement.
9. Ans : D
Explanation: All option are Common use cases for data visualization.
10. Ans : C
Explanation: If the time series is random, such autocorrelations should be near zero for any and all timelag separations.
11. Ans : A
Explanation: Pros of data visualization : it can be accessed quickly by a wider audience.
12.
Ans : C
Explanation: It can be distracting : if the visual data is distorted or excessively used.
13. Ans : C
Explanation: Fever Maps is not is not used for data visualization instead of that Fever charts is used.
14. Ans : A
Explanation: Line charts. This is one of the most basic and common techniques used. Line charts display how variables can change over time.
15. Ans : B
Explanation: par() is used to query and edit graphical settings.
16 Ans : B
Explanation: data() load (often into a data.frame) built-in dataset.
17. Ans : A
Explanation: lm calls the lower level functions lm.fit.
18.
Ans : D
Explanation: names function is used to associate name with the value in the vector.
19.
Ans : D
Explanation: All option are correct.
20. Ans : C
Explanation: kde is used for density plots.
MCQ for UNIT 5
1. Point out the correct statement. a) Hadoop is an ideal environment for extracting and transforming small volumes of data b) Hadoop stores data in HDFS and supports data compression/decompression c) The Giraph framework is less useful than a MapReduce job to solve graph and machine learning d) None of the mentioned
2. Which of the following genres does Hadoop produce? a) Distributed file system b) JAX-RS c) Java Message Service d) Relational Database Management System
3. Which of the following platforms does Hadoop run on? a) Bare metal b) Debian c) Cross-platform d) Unix-like
4. Hadoop achieves reliability by replicating the data across multiple hosts and hence does not require ________ storage on hosts. a) RAID b) Standard RAID levels c) ZFS d) Operating system
5. The Hadoop list includes the HBase database, the Apache Mahout ________ system, and matrix operations. a) Machine learning b) Pattern recognition c) Statistical classification d) Artificial intelligence
6. As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including _______________ a) Improved data storage and information retrieval b) Improved extract, transform and load features for data integration c) Improved data warehousing functionality d) Improved security, workload management, and SQL support
7. Point out the correct statement. a) Hadoop do need specialized hardware to process the data b) Hadoop 2.0 allows live stream processing of real-time data c) In the Hadoop programming framework output files are divided into lines or records d) None of the mentioned
8. According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop? a) Big data management and data mining b) Data warehousing and business intelligence c) Management of Hadoop clusters d) Collecting and storing unstructured data
9. Hadoop is a framework that works with a variety of related tools. Common cohorts include ____________ a) MapReduce, Hive and HBase b) MapReduce, MySQL and Google Apps c) MapReduce, Hummer and Iguana d) MapReduce, Heron and Trumpet
10. Point out the wrong statement. a) Hardtop processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data b) Hadoop uses a programming model called “MapReduce”, all the programs should conform to this model in order to work on the Hadoop platform c) The programming model, MapReduce, used by Hadoop is difficult to write and test d) All of the mentioned
11. What was Hadoop named after? a) Creator Doug Cutting’s favorite circus act b) Cutting’s high school rock band c) The toy elephant of Cutting’s son d) A sound Cutting’s laptop made during Hadoop development
12. All of the following accurately describe Hadoop, EXCEPT ____________ a) Open-source b) Real-time c) Java-based d) Distributed computing approach
13. __________ can best be described as a programming model used to develop Hadoopbased applications that can process massive amounts of data. a) MapReduce b) Mahout
c) Oozie d) All of the mentioned
14. __________ has the world’s largest Hadoop cluster. a) Apple b) Datamatics c) Facebook d) None of the mentioned
15. Facebook Tackles Big Data With _______ based on Hadoop. a) ‘Project Prism’ b) ‘Prism’ c) ‘Project Big’ d) ‘Project Data’
16. ________ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets. a) Pig Latin b) Oozie c) Pig d) Hive
17. Point out the correct statement. a) Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data b) Hive is a relational database with SQL support c) Pig is a relational database with SQL support d) All of the mentioned
18. Hive also support custom extensions written in ____________ a) C# b) Java c) C d) C++
19. Point out the wrong statement. a) Elastic MapReduce (EMR) is Facebook’s packaged Hadoop offering b) Amazon Web Service Elastic MapReduce (EMR) is Amazon’s packaged Hadoop offering c) Scalding is a Scala API on top of Cascading that removes most Java boilerplate d) All of the mentioned
20. ___________ is general-purpose computing model and runtime system for distributed data analytics. a) Mapreduce b) Drill
c) Oozie d) None of the mentioned
21. The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to ____________ a) SQL b) JSON c) XML d) All of the mentioned
22. _______ jobs are optimized for scalability but not latency. a) Mapreduce b) Drill c) Oozie d) Hive
23. A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker. a) MapReduce b) Mapper c) TaskTracker d) JobTracker
24. Point out the correct statement. a) MapReduce tries to place the data and the compute as close as possible b) Map Task in MapReduce is performed using the Mapper() function c) Reduce Task in MapReduce is performed using the Map() function d) All of the mentioned
25. ___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results. a) Maptask b) Mapper c) Task execution d) All of the mentioned
26. _________ function is responsible for consolidating the results produced by each of the Map() functions/tasks. a) Reduce b) Map c) Reducer d) All of the mentioned
27. ________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer.
a) Hadoop Strdata b) Hadoop Streaming c) Hadoop Stream d) None of the mentioned
28. __________ maps input key/value pairs to a set of intermediate key/value pairs. a) Mapper b) Reducer c) Both Mapper and Reducer d) None of the mentioned
29. The number of maps is usually driven by the total size of ____________ a) inputs b) outputs c) tasks d) None of the mentioned
30. Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster. a) MapReduce b) Map c) Reducer d) All of the mentioned
31. A ________ serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication
32. Point out the correct statement. a) DataNode is the slave/worker node and holds the user data in the form of Data Blocks b) Each incoming file is broken into 32 MB by default c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance d) None of the mentioned
33. HDFS works in a __________ fashion. a) master-worker b) master-slave c) worker/slave d) all of the mentioned
34. Point out the wrong statement. a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode c) User data is stored on the local file system of DataNodes d) DataNode is aware of the files to which the blocks stored on it belong to
35. Which of the following scenario may not be a good fit for HDFS? a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file b) HDFS is suitable for storing data related to applications requiring low latency data access c) HDFS is suitable for storing data related to applications requiring low latency data access d) None of the mentioned
36. The need for data replication can arise in various scenarios like ____________ a) Replication Factor is changed b) DataNode goes down c) Data Blocks get corrupted d) All of the mentioned
37. ________ is the slave/worker node and holds the user data in the form of Data Blocks. a) DataNode b) NameNode c) Data block d) Replication
38. HDFS provides a command line interface called __________ used to interact with HDFS. a) “HDFS Shell” b) “FS Shell” c) “DFS Shell” d) None of the mentioned
39. HDFS is implemented in _____________ programming language. a) C++ b) Java c) Scala d) None of the mentioned
40. For YARN, the ___________ Manager UI provides host and port information. a) Data Node b) NameNode
c) Resource d) Replication
41. During start up, the ___________ loads the file system state from the fsimage and the edits log file. a) DataNode b) NameNode c) ActionNode d) None of the mentioned
42. Point out the correct statement. a) A Hadoop archive maps to a file system directory b) Hadoop archives are special format archives c) A Hadoop archive always has a *.har extension d) All of the mentioned
43. Using Hadoop Archives in __________ is as easy as specifying a different input filesystem than the default file system. a) Hive b) Pig c) MapReduce d) All of the mentioned
44. Pig operates in mainly how many nodes? a) Two b) Three c) Four d) Five
45. Point out the correct statement. a) You can run Pig in either mode using the “pig” command b) You can run Pig in batch mode using the Grunt shell c) You can run Pig in interactive mode using the FS shell d) None of the mentioned
46. You can run Pig in batch mode using __________ a) Pig shell command b) Pig scripts c) Pig options d) All of the mentioned
47. Pig Latin statements are generally organized in one of the following ways? a) A LOAD statement to read data from the file system b) A series of “transformation” statements to process the data
c) A DUMP statement to view results or a STORE statement to save the results d) All of the mentioned
48. Point out the wrong statement. a) To run Pig in local mode, you need access to a single machine b) The DISPLAY operator will display the results to your terminal screen c) To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation d) All of the mentioned
49. Which of the following function is used to read data in PIG? a) WRITE b) READ c) LOAD d) None of the mentioned
50. You can run Pig in interactive mode using the ______ shell. a) Grunt b) FS c) HDFS d) None of the mentioned
51. HBase is a distributed ________ database built on top of the Hadoop file system. a) Column-oriented b) Row-oriented c) Tuple-oriented d) None of the mentioned
52. Point out the correct statement. a) HDFS provides low latency access to single rows from billions of records (Random access) b) HBase sits on top of the Hadoop File System and provides read and write access c) HBase is a distributed file system suitable for storing large files d) None of the mentioned
53. HBase is ________ defines only column families. a) Row Oriented b) Schema-less c) Fixed Schema d) All of the mentioned
54. Apache HBase is a non-relational database modeled after Google’s _________ a) BigTop b) Bigtable
c) Scanner d) FoundationDB
55. Point out the wrong statement. a) HBase provides only sequential access to data b) HBase provides high latency batch processing c) HBase internally provides serialized access d) All of the mentioned
56. The _________ Server assigns regions to the region servers and takes the help of Apache ZooKeeper for this task. a) Region b) Master c) Zookeeper d) All of the mentioned
57. Which of the following command provides information about the user? a) status b) version c) whoami d) user
58. Which of the following command does not operate on tables? a) enabled b) disabled c) drop d) all of the mentioned
59. _________ command fetches the contents of a row or a cell. a) select b) get c) put d) none of the mentioned
60. HBaseAdmin and ____________ are the two important classes in this package that provide DDL functionalities. a) HTableDescriptor b) HDescriptor c) HTable d) HTabDescriptor
61. Which of the following is not a NoSQL database? a) SQL Server b) MongoDB
c) Cassandra d) None of the mentioned
62. Point out the correct statement. a) Documents can contain many different key-value pairs, or key-array pairs, or even nested documents b) MongoDB has official drivers for a variety of popular programming languages and development environments c) When compared to relational databases, NoSQL databases are more scalable and provide superior performance d) All of the mentioned
63. Which of the following is a NoSQL Database Type? a) SQL b) Document databases c) JSON d) All of the mentioned
64. Which of the following is a wide-column store? a) Cassandra b) Riak c) MongoDB d) Redis
65. Point out the wrong statement. a) Non Relational databases require that schemas be defined before you can add data b) NoSQL databases are built to allow the insertion of data without a predefined schema c) NewSQL databases are built to allow the insertion of data without a predefined schema d) All of the mentioned
66. Most NoSQL databases support automatic __________ meaning that you get high availability and disaster recovery. a) processing b) scalability c) replication d) all of the mentioned
67. Which of the following are the simplest NoSQL databases? a) Key-value b) Wide-column c) Document d) All of the mentioned
68. ________ stores are used to store information about networks, such as social connections. a) Key-value b) Wide-column c) Document d) Graph
69. NoSQL databases is used mainly for handling large volumes of ______________ data. a) unstructured b) structured c) semi-structured d) all of the mentioned
70. Point out the wrong statement? a) Key feature of R was that its syntax is very similar to S b) R runs only on Windows computing platform and operating system c) R has been reported to be running on modern tablets, phones, PDAs, and game consoles d) R functionality is divided into a number of Packages
71. R functionality is divided into a number of ________ a) Packages b) Functions c) Domains d) Classes
72. Which Package contains most fundamental functions to run R? a) root b) child c) base d) parent
73. Point out the wrong statement? a) One nice feature that R shares with many popular open source projects is frequent releases b) R has sophisticated graphics capabilities c) S’s base graphics system allows for very fine control over essentially every aspect of a plot or graph d) All of the mentioned
74. Which of the following is a base package for R language? a) util b) lang
c) tools d) spatial
75. Which of the following is “Recommended” package in R? a) util b) lang c) stats d) spatial
76. What is the output of getOption(“defaultPackages”) in R studio? a) Installs a new package b) Shows default packages in R c) Error d) Nothing will prin
77. Which of the following is used for Statistical analysis in R language? a) RStudio b) Studio c) Heck d) KStudio
78. In R language, a vector is defined that it can only contain objects of the ________ a) Same class b) Different class c) Similar class d) Any class
79. A list is represented as a vector but can contain objects of ___________ a) Same class b) Different class c) Similar class d) Any class
80. How can we define ‘undefined value’ in R language? a) Inf b) Sup c) Und d) NaN
81. What is NaN called? a) Not a Number b) Not a Numeric c) Number and Number d) Number a Numeric
82. How can we define ‘infinity’ in R language? a) Inf b) Sup c) Und d) NaN
83. Which one of the following is not a basic datatype? a) Numeric b) Character c) Data frame d) Integer
84. Matrices can be created by row-binding with the help of the following function. a) rjoin() b) rbind() c) rowbind() d) rbinding()
85. What is the function used to test objects (returns a logical operator) if they are NA? a) is.na() b) is.nan() c) as.na() d) as.nan()
86. What is the function used to test objects (returns a logical operator) if they are NaN? a) as.nan() b) is.na() c) as.na() d) is.nan()
87. What is the function to set column names for a matrix? a) names() b) colnames() c) col.names() d) column name cannot be set for a matrix
88. The most convenient way to use R is at a graphics workstation running a ________ system. a) windowing b) running c) interfacing d) matrix
89. Point out the wrong statement? a) Setting up a workstation to take full advantage of the customizable features of R is a
straightforward thing b) q() is used to quit the R program c) R has an inbuilt help facility similar to the man facility of UNIX d) Windows versions of R have other optional help systems also
90. Point out the wrong statement? a) Windows versions of R have other optional help system also b) The help.search command (alternatively ??) allows searching for help in various ways c) R is case insensitive as are most UNIX based packages, so A and a are different symbols and would refer to different variables d) $ R is used to start the R program
91. Elementary commands in R consist of either _______ or assignments. a) utilstats b) language c) expressions d) packages
92. How to install for a package and all of the other packages on which for depends? a) install.packages (for, depends = TRUE) b) R.install.packages (“for”, depends = TRUE) c) install.packages (“for”, depends = TRUE) d) install (“for”, depends = FALSE)
93. __________ function is used to watch for all available packages in library. a) lib() b) fun.lib() c) libr() d) library()
94. Attributes of an object (if any) can be accessed using the ______ function. a) objects() b) attrib() c) attributes() d) obj()
95. R objects can have attributes, which are like ________ for the object. a) metadata b) features c) expression d) dimensions
96. ________ generate random Normal variates with a given mean and standard deviation. a) dnorm b) rnorm
c) pnorm d) rpois
97. Point out the correct statement? a) R comes with a set of pseudo-random number generators b) Random number generators cannot be used to model random inputs c) Statistical procedure does not require random number generation d) For each probability distribution there are typically three functions
98. ______ evaluate the cumulative distribution function for a Normal distribution. a) dnorm b) rnorm c) pnorm d) rpois
99. _______ generate random Poisson variates with a given rate. a) dnorm b) rnorm c) pnorm d) rpois
100. Point out the wrong statement? a) For each probability distribution there are typically three functions b) For each probability distribution there are typically four functions c) r function is sufficient for simulating random numbers d) R comes with a set of pseudo-random number generators
101. _________ is the most common probability distribution to work with. a) Gaussian b) Parametric c) Paradox d) Simulation
102. Point out the correct statement? a) When simulating any random numbers it is not essential to set the random number seed b) It is not possible to generate random numbers from other probability distributions like the Poisson c) You should always set the random number seed when conducting a simulation d) Statistical procedure does not require random number generation
103. _______ function is used to simulate binary random variables. a) dnorm b) rbinom() c) binom() d) rpois
104. Point out the wrong statement? a) Drawing samples from specific probability distributions can be done with “s” functions b) The sample() function draws randomly from a specified set of (scalar) objects allowing you to sample from arbitrary distributions of numbers c) The sampling() function draws randomly from a specified set of objects d) You should always set the random number seed when conducting a simulation
105. _______ grammar makes a clear distinction between your data and what gets displayed on the screen or page. a) ggplot1 b) ggplot2 c) d3.js d) ggplot3
106. Point out the wrong statement? a) mean_se is used to calculate mean and standard errors on either side b) hmisc wraps up a selection of summary functions from Hmisc to make it easy to use c) plot is used to create a scatterplot matrix (experimental) d) translate_qplot_base is used for translating between qplot and base graphics
107. Which of the following cuts numeric vector into intervals of equal length? a) cut_interval b) cut_time c) cut_number d) cut_date
108. Which of the following is a plot to investigate the order in which observations were recorded? a) ggplot b) ggsave c) ggpcp d) ggorder
109. ________ is used for translating between qplot and base graphics. a) translate_qplot_base b) translate_qplot_gpl c) translate_qplot_lattice d) translate_qplot_ggplot
110. Which of the following is discrete state calculator? a) discrete_scale b) ggpcp c) ggfluctuation d) ggmissing
111. Which of the following creates fluctuation plot? a) ggmissplot b) ggmissing c) ggfluctuation d) ggpcp
112. __________ create a complete ggplot appropriate to a particular data type. a) autoplot b) is.ggplot c) printplot d) qplot_ggplot
113. Which of the following creates a new ggplot plot from a data frame? a) qplot_ggplot b) ggplot.data.frame c) ggfluctuation d) ggmissplot
Department of Information Technology
DATA ANALYTICS – KIT601 – Question Bank
UNIT-1
1. Data originally collected in the process of investigation are known as a) Foreign data b) Primary data c) Third data d) Secondary data e) None of these
2. Statistical enquiry means 1. It is science for knowledge 2. Search for knowledge 3. Collection of anything 4. Search for knowledge with the help of statistical methods. e) None of these
3. Cluster sampling means a) Sample is divided into number of sub-groups b) Sample are selected at regular interval c) Sample is obtained by conscious selection d) Universe is divided into groups e) None of these
4. What is Secondary data? a) Data collected in the process of investigation b) Data collected from some other agency c) Data collected from questionnaire of a person d) Both A & B e) None of these
5. What is information? a) Raw facts b) Processed data c) Understanding facts d) Knowing action on data e) None of these
6. Data about rocks is an example of a) Time dependent data b) Time Independent data c) Location dependent data d) Location independent data e) None of these
7. Range on temperature scale is termed as a) Nominal data b) Ordinal data
Department of Information Technology
c) Interval data d) Ratio data e) None of these
8. Data in XML and CSV format is an example of a) Structure data b) Un-structure data c) Semi-structure data d) Both A & B e) None of these
9. Which is not the characteristic of data a) Accuracy b) Consistency c) Granularity d) Redundant e) None of these
10. Hadoop is a framework that works with a variety of related tools. Common cohorts include: a) MapReduce, Hive and HBase b) MapReduce, MySQL and Google Apps c) MapReduce, Hummer and Iguana d) MapReduce, Heron and Trumpet e) None of these
11. Which is not the V in BIG data a) Volume b) Veracity c) Vigor d) Velocity e) None of these
12. Which is not true about Traditional decision making? a) Does not require human intervention b) Takes a long time to come to decision c) Lacks systematic linkage in planning d) Provides limited scope of data analytics e) None of these
13. Cloudera is a product of a) Microsoft b) Apache c) Google d) Facebook e) None of these
14. What is not true about MPP architecture a) Tightly coupled nodes b) High speed connection among nodes c) Disk are not shared
Department of Information Technology
d) Uses lot of processors e) None of these
15. The process of organizing and summarizing data in an easily readable format to communicate important information is known as a) Analysis b) Reporting c) Clustering d) Mining e) None of these
16. Out of the following which is not a type of report a) Canned b) Dashboard c) Ad hoc response d) Alerts e) None of these
17. Data Analysis is a process of? a) inspecting data b) cleaning data c) transforming data d) All of above e) None of these
18. Which of the following is not a major data analysis approaches? a) Data Mining b) Predictive Intelligence c) Business Intelligence d) Text Analytics e) None of these
19. How many main statistical methodologies are used in data analysis? a) 2 b) 3 c) 4 d) 5 e) None of these
20. Which of the following is true about regression analysis? a) answering yes/no questions about the data b) estimating numerical characteristics of the data c) modeling relationships within the data d) describing associations within the data e) None of these
21. __________ may be defined as the data objects that do not comply with the general behavior or model of the data available. a) Outlier Analysis b) Evolution Analysis
Department of Information Technology
c) Prediction d) Classification e) None of these
22. What is the use of data cleaning? a) to remove the noisy data b) correct the inconsistencies in data c) transformations to correct the wrong data. d) All of the above e) None of these
23. In data mining, this is a technique used to predict future behavior and anticipate the consequences of change. a) predictive technology b) disaster recovery c) phase change d) predictive modeling e) None of these
24. What are the main components of Big Data? a) MapReduce b) HDFS c) HBASE d) All of these e) None of these
25. ———- data that depends on data model and resides in a fixed field within a record. a) Structured data b) Un-Structured data c) Semi-Structured data d) Scattere e) None of these
26. —————- is about developing code to enable the machine to learn to perform tasks and its basic principle is the automatic modeling of underlying that have generated the collected data. a) Data Science b) Data Analytics c) Data Mining d) Data Warehousing e) None of these
27. —————– is an example of human generated unstructured data. a) YouTube data b) Satellite data c) Sensor data d) Seismic imagery data e) None of these
Department of Information Technology
28. Height is an example of which type of attribute a) Nominal b) Binary c) Ordinal d) Numeric e) None of these
29. ————-type of analytics describes what happened in past a) Descriptive b) Prescriptive c) Predictive d) Probability e) None of these
30. ————– data does not fits into a data model due to variations in contents a) Structured data b) Un - Structured data c) Semi Structured data d) Both B & C e) None of these
Department of Information Technology
UNIT-2
31. A and B are two events. If P(A, B) decreases while P(A) increases, which of the following is true? a) P(A|B) decreases b) P(B|A) decreases c) P(B) decreases d) All of above e) None of these
32. Suppose we like to calculate P(H|E, F) and we have no conditional independence information. Which of the following sets of numbers are sufficient for the calculation? a) P(E, F), P(H), P(E|H), P(F|H) b) P(E, F), P(H), P(E, F|H) c) P(H), P(E|H), P(F|H) d) P(E, F), P(E|H), P(F|H) e) None of these
33. Suppose you are given an EM algorithm that finds maximum likelihood estimates for a model with latent variables. You are asked to modify the algorithm so that it finds MAP estimates instead. Which step or steps do you need to modify? a) Expectation b) Maximization c) No modification necessary d) Both A & B e) None of these
34. Compared to the variance of the Maximum Likelihood Estimate (MLE), the variance of the Maximum A Posteriori (MAP) estimate is ________ a) higher b) same c) lower d) it could be any of the above e) None of these
35. Bayesian methods are important to our study of machine learning is that they provide a useful perspective for understanding many learning algorithms that do not ............................ manipulate probabilities. a) explicitly b) implicitly c) both a & b d) approximately e) None of these
36. The results that we get after we apply Bayesian Theorem to a problem are, a) 100% accurate b) Estimated values c) Wrong values d) Only positive values e) None of these
Department of Information Technology
37. The previous probabilities in Bayes theorem that are changed with the help of new available information are classified as a) independent probabilities b) posterior probabilities c) interior probabilities d) dependent probabilities e) None of these
38. In contrast to the naive Bayes classifier, Bayesian belief networks allow stating conditional independence assumptions that apply to ............................... of the variables. a) subsets b) super sets c) empty set d) All of above e) None of these
39. The naive Bayes classifier applies to learning tasks where each instance x is described by a conjunction of attribute values and where the target function f ( x ) can take on ................. value from some................... set V. a) one, finite b) any, infinite c) one, infinite d) any, finite e) None of these
40. Bayes rule can be used to........................conditioned on one piece of evidence. a) solve queries b) increase complexity of a query c) decrease complexity of a query d) answer probabilistic queries e) None of these
41. Among which of the following mentioned statements can the Bayesian probability be applied? (i) In the cases, where we have one event (ii) In the cases, where we have two events (iii) In the cases, where we have three events (iv) In the cases, where we have more than three events
Options:
a) Only iv. b) All i., ii., iii. and iv. c) ii. and iv. d) Only ii. e) None of these
42. How the Bayesian network can be used to answer any query? a) Full distribution b) Joint distribution
Department of Information Technology
c) Partial distribution d) All of the mentioned above e) None of these
43. Which of the following methods do we use to find the best fit line for data in Linear Regression? a) Least Square Error b) Maximum Likelihood c) Logarithmic Loss d) Both A and B e) None of these
44. Linear Regression is a ..................... machine learning algorithm. a) supervised b) unsupervised c) reinforcement d) Both A & B e) None of these
45. Which of the following statement is true about outliers in Linear regression? a) Linear regression is not sensitive to outliers b) Linear regression is sensitive to outliers c) Can’t say d) There are no outliers e) None of these
46. Which of the following sentence is FALSE regarding regression? a) It relates inputs to outputs. b) It is used for prediction. c) It may be used for interpretation. d) It discovers causal relationships. e) None of these
47. Which of the following methods do we use to best fit the data in Logistic Regression? a) Least Square Error b) Maximum Likelihood c) Jaccard distance d) Both A & B e) None of these
48. Which of the following option is true? a) Linear Regression errors values has to be normally distributed but in case of Logistic Regression it is not the case b) Logistic Regression errors values has to be normally distributed but in case of Linear Regression it is not the case c) Both Linear Regression and Logistic Regression error values have to be normally distributed d) Both Linear Regression and Logistic Regression error values have not to be normally distributed e) None of these
Department of Information Technology
49. A decision tree is also known as a) general tree b) binary tree c) prediction tree d) fuzzy tree e) None of these
50. The confusion matrix is a useful tool for analyzing a) Regression b) Classification c) Sampling d) Cross Validation e) None of these
51. In regression the independent variable is also called as ———– a) Regressor b) Continuous c) Regressand d) Estimated e) None of these
52. ————— searches for the linear optimal separating hyperplane for separation of the data using essential training tuples called support vectors a) Decision tree b) Association Rule Mining c) Clustering d) Support vector machines e) None of these
53. Which of the following is used as attribute selection measure in decision tree algorithms? a) Information Gain b) Posterior probability c) Prior probability d) Support e) None of these
54. ———- is unsupervised technique aiming to divide a multivariate dataset into clusters or groups. a) KNN b) SVM c) Regression d) Cluster Analysis e) None of these
55. A perfect negative correlation is signified by ————- a) 1 b) -1 c) 0 d) 2
Department of Information Technology
e) None of these
56. ———— rule mining is a technique to identify underlying relations between different items. a) Classification b) Regression c) Clustering d) Association e) None of these
57. ———– is supervised machine learning algorithm outputs an optimal hyperplane for given labeled training data a) KNN b) SVM c) Regression d) Decision Tree e) None of these
58. Which of the following is measure used in decision trees while selecting splitting criteria that partitions data into the best possible manner. a) Probability b) Gini Index c) Regression d) Confusion matrix e) None of these
59. Which of the following is not a type of clustering algorithm? a) Density clustering b) K-Means clustering c) Centroid clustering d) Simple clustering e) None of these
60. —— answers the questions like ” How can we make it happen?” a) Descriptive b) Prescriptive c) Predictive d) Probability e) None of these
Department of Information Technology
UNIT-3
61. Some company wants to divide their customers into distinct groups to send offers this is an example of a) Data Extraction b) Data Classification c) Data Discrimination d) Data Selection e) None of these
62. When do we use Manhattan distance in data mining? a) Dimension of the data decreases b) Dimension of the data increases c) Under fitting d) Moderate size of the dimensions e) None of these
63. When there is no impact on one variable when increase or decrease on other variable then it is ———— a) Perfect correlation b) Positive correlation c) Negative correlation d) No correlation e) None of these
64. Apriori algorithm uses breadth first search and ————structure to count candidate item sets efficiently. a) Decision tree b) Hash Tree c) Red-Black Tree d) AVL Tree e) None of these
65. To determine basic salary of an employee when his qualification is given is a ———– problem a) Correlation b) Regression c) Association d) Qualitative e) None of these
66. ————the step is performed by data scientist after acquiring the data. a) Data Cleansing b) Data Integration c) Data Replication d) Data loading e) None of these
67. ———– is an indication of how often the rule has been found to be true in association rule mining. a) Confidence
Department of Information Technology
b) Support c) Lift d) Accuracy e) None of these
68. Which of the following statements about data streaming is true? a) Stream data is always unstructured data. b) Stream data often has a high velocity. c) Stream elements cannot be stored on disk. d) Stream data is always structured data. e) None of these
69. A Bloom filter guarantees no a) false positives b) false negatives c) false positives and false negatives d) false positives or false negatives, depending on the Bloom filter type e) None of these
70. The FM-sketch algorithm can be used to: a) Estimate the number of distinct elements. b) Sample data with a time-sensitive window. c) Estimate the frequent elements. d) Determine whether an element has already occurred in previous stream data. e) None of these
71 The DGIM algorithm was developed to estimate the counts of 1's occur within the last k bits of a stream window N. Which of the following statements is true about the estimate of the number of 0's based on DGIM? a) The number of 0's cannot be estimated at all. b) The number of 0's can be estimated with a maximum guaranteed error. c) To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's. d) Only 1’s can be estimated not 0’s e) None of these
72. What are DGIM’s maximum error boundaries? a) DGIM always underestimates the true count; at most by 25% b) DGIM either underestimates or overestimates the true count; at most by 50% c) DGIM always overestimates the count; at most by 50% d) DGIM either underestimates or overestimates the true count; at most by 25% e) None of these
73. Which algorithm should be used to approximate the number of distinct elements in a data stream? a) Misra-Gries b) Alon-Matias-Szegedy c) DGIM d) Apriori e) None of these
74. Which of the following statements about standard Bloom filters is correct? a) It is possible to delete an element from a Bloom filter. b) A Bloom filter always returns the correct result. c) It is possible to alter the hash functions of a full Bloom filter to create more space. d) A Bloom filter always returns TRUE when testing for a previously added element. e) None of these
75. ETL stands for ________________ a) Extraction transformation and loading b) Extract Taken Lend c) Enterprise Transfer Load d) Entertainment Transference Load e) None of these
76. Which of the following is not a major data analysis approaches? a) Data Mining b) Predictive Intelligence c).Business Intelligence d) Text Analytics e) None of these
77. What do you mean by Real Time ANALYTICS platform. a) Manages and process data and helps timely decision making b helps to develop dynamic analysis application c) leads to evolution of non business intelligence d) hadoop e)None of these
78. Data Analysis is defined by the statistician? a) William S. b)Hans Peter Luhn c) Gregory Piatetsky-Shapiro d) John Tukey e)None of these
79 Which of the following is a wrong statement. a). The big volume actually represents Big Data b). Big Data is just about tons of data c). The data growth and social media explosion have improved that how we look at the data d). All of these e). None of these
80 Which of the following emphases on the discovery of earlier properties that are not known on the data? a) Machine Learning b). Big Data c). Data wrangling d). Data mining e)None of these
Department of Information Technology
81 What are DGIM’s maximum error boundaries? a)DGIM always underestimates the true count; at most by 25% b)DGIM either underestimates or overestimates the true count; at most by 50% c)DGIM always overestimates the count; at most by 50% d)DGIM either underestimates or overestimates the true count; at most by 25% e)None of these
82 A Bloom filter guarantees no a)false positives b)false negatives c)false positives and false negatives d)false positives or false negatives, depending on the Bloom filter e)None of these
83. Which of the following statements about the standard DGIM algorithm are false? a) DGIM operates on a time-based window. b) In DGIM, the size of a bucket is always a power of two. c) The maximum number of buckets has to be chosen beforehand. d) The buckets contain the count of 1's and each 1's specific position in the stream. e)None of these
84 What are two differences between large-scale computing and big data processing? a)hardware b) Data is more suitable for finding new patterns in data than Large Scale Computing c) amount of processing time available d) amount of data processed e)None of these
85 In Flajolet-Martin algorithm if the stream contains n elements with m of them unique, this algorithm runs in a) O(n) time b) constant time c) O(2n) time d) O(3n)time e)None of these
86 What are two differences between large-scale computing and big data processing? a) hardware b) Data is more suitable for finding new patterns in data than Large Scale Computing c) amount of processing time available d) number of passes made over the data e)None of these
87 what does it mean when an algorithm is said to 'scale well'? a) The running time does not increase exponentially when data becomes longer. b)The result quality goes up when the data becomes larger. c) The memory usage does not increase exponentially when data becomes larger. d) The result quality remains the same when the data becomes larger. e)None of these
Department of Information Technology
89The FM-sketch algorithm can be used to: a)Estimate the number of distinct elements. b)Sample data with a time-sensitive window. c)Estimate the frequent elements. d)Determine whether an element has already occurred in previous stream data. e)None of these
90Which attribute is _not_ indicative for data streaming? a)Limited amount of memory b)Limited amount of processing time c)Limited amount of input data d)Limited amount of processing power e)None of these
UNIT 4
91 Which of the following clustering type has characteristic shown in the below figure?
a) Exploratory b) Inferential c) Causal d) Hierarchical Clustering e)None of these
92 Which of the following dimension type graph is shown in the below figure?
a) one-dimensional b) two-dimensional c) three-dimensional d) four-dimensional e)None of these
93 Which of the following gave rise to need of graphs in data analysis? a)Data visualization b) Communicating results
Department of Information Technology
c) Decision making d) All of the mentioned e)None of these
94Which of the following is characteristic of exploratory graph? a) Made slowly b) Axes are not cleaned up c) Color is used for personal information d) All of the mentioned e)None of these
95Color and shape are used to add dimensions to graph data. a)True b) False c)Dilemma d)Incorrect Statement e)None of these
96.Which of the following information is not given by five-number summary? a) Mean b) Median c) Mode d) All of the mentioned e)None of these
97.Which of the following is also referred to as overlayed 1D plot? a)lattice b) barplot c) gplot d) all of the mentioned e)None of these
98.Spinning plots can be used for two dimensional data. a)True b) False c)Incorrect d)Not Sure e)None of these
99 Point out the correct statement. a) coplots are one dimensional data graph b) Exploratory graphs are made quickly c) Exploratory graphs are made relatively less in number d) All of the mentioned e)None of these
100 Which of the following clustering technique is used by K- Means Algorithm a)HierarchicalTechnique b)Partitional technique c)Divisive
Department of Information Technology
d) Agglomerative e)None of these
101.SON algorithm is also known as a)PCY Algorithm b MultistageAlgorithm c)Multihash Algorithm d)Partition Algorithm D e)None of these
102. Which technique is used to filter unnecessary itemset in PCY algorithm a )Association Rule b)Hashing Technique c)Data Mining d)Market basket B e)None of these
103 In association rule, which of the following indicates the measure of how frequently the items occur in a dataset ? a)Support b)Confidence c)Basket d)Itemset e)None of these
104.Which term indicated the degree of corelation in dataset between X and Y, if the given association rule given is X-->Y a)Confidence b)Monotonicity c)Distinct d)Hashing e)None of these
105.During start up, the ___________ loads the file system state from the fsimage and the edits log file. a) DataNode b) NameNode c) ActionNode d) Data Action Node e)None of these
106 Which of the following scenario may not be a good fit for HDFS? a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to thesame file b) HDFS is suitable for storing data related to applications requiring low latency data access c) HDFS is suitable for storing data related to applications requiring low latency data access d) HDFS is suitable for scenarios requiring multiple/simultaneous writes to thesame file e)None of these
Department of Information Technology
107________ is the slave/worker node and holds the user data in the form of Data Blocks. a) DataNode b) NameNode c) Data block d) Replication e)None of these
108.HDFS provides a command line interface called __________ used to interact with HDFS. a) “HDFS Shell” b) “FS Shell” c) “DFS Shell” d) None of the mentioned e)None of these 109 What is CLIQUE ? a)CLIQUE is a grid based method for finding density based clusters in subspaces. b)CLIQUE is a click method c)used to prune non- promising cells and to improve efficiency d)used to measure distance e)None of these
110 CLIQUE stands for ? a) Clustering in QUEst b) Common in Quest c)Calculate in Quest d)Click in Quest e)None of these
111What is approaches for high dimensional data clustering a)Subspace clustering b)Projected clustering and Biclustering. c) Data Clustering d)Space Clustering e)None of these
112Applications of frequent itemset analysis a) Related concepts ,Plagiarism , Biomarkers b)CLUSTERING c)Design d)Operation e)None of these
113. k-means is a ………..based algorithm or distance based algorithm where we calculate the distances to assign a point to a cluster a) centroid b)Distance c)Neuron d)Dendron e) None of these
Department of Information Technology
114--------is an algorithm for frequent item set mining and association rule learning over relational databases a)Confidence b) Apriori c)Disadvantage d)Market basket e) None of these
115 The HBase database includes the Hadoop list, the Apache Mahout ________ system, and matrix operations. A. Statistical classification B. Pattern recognition C. Machine learning D. Artificial intelligence E. All of these
116 To discover interesting relations between objects in larger databases is a objective of ---- a) Freqent Set Mining b)Market basket Mining c) association rules mining d) Confidence Gain e) None of these
117 Different methods for storing itemset count in main memory. a)The triangular matrix method b)The triples method c)Angular method d)Square Method e) None of these
118 ------used to prune non- promising cells and to improve efficiency. a)market basket b)frequent itemset c)Support d) aprioriproperty e) None of these
119 dentify the algorithm in which, on the first pass we count the item themselves and then determine which items are frequent. On the second pass we count only the pairs of item both of which are found frequent on first pass a)DGIM b)CURE c)Pagerank d)Apriori e)None of these 120 A resource used for sharing data globally by all nodes is a)Distributed b) Cache Centralised Cache c)secondry memory d)primarymemory e) None of these
Department of Information Technology
Unit 5 121.Input to the is the sorted output of the mappers. a) Reducer b) Mapper c) Shuffle d) All of the above e)None of these
122. Which of the following statements about data streaming is true? a)Stream data is always unstructured data. b)Stream data often has a high velocity. c)Stream elements cannot be stored on disk. d)Stream data is always structured data. e)None of these
123 The output of the is not sorted in the Mapreduce framework for Hadoop. (A) Mapper (B) Cascader (C) Scalding (D) None of the above e) None of these
124: Which of the following phases occur simultaneously? (A) Reduce and Sort (B) Shuffle and Sort (C) Shuffle and Map d)sort and ruduce e) None of these
125.A ________ serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication e) None of these
126.HDFS works in a __________ fashion. a) master-worker b) master-slave c) worker/slave d) all of the mentioned e) None of these
127.________ NameNode is used when the Primary NameNode goes down. a) Rack b) Data c) Secondary d) None of the mentioned e) None of these
Department of Information Technology
128 Point out the wrong statement. a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode c) User data is stored on the local file system of DataNodes d) DataNode is aware of the files to which the blocks stored on it belong to e) None of these
129 The need for data replication can arise in various scenarios like ____________ a) Replication Factor is changed b) DataNode goes down c) Data Blocks get corrupted d) All of the mentioned e) None of these
130.For YARN, the ___________ Manager UI provides host and port information. a) Data Node b) NameNode c) Resource d) Replication e) None of these 131 HDFS works in a __________ fashion. a)worker-master fashion b)master-slave fashion c)master-worker fashion d)slave-master e)None of these
132HDFS is implemented in _____________ language. a) C b)Perl c)Python d)Java e)none of these
133 The default block size in hadoop is ______. a)16MB b) 32MB c)64MB d)128MB e) none of these
134. ____ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data. a)MapReduce b)Mahout c)Oozie d)Hbase e)None of these
Department of Information Technology
135 Mapper and Reducer implementations can use the to report progress or just indicate that they are alive. (A) Partitioner (B) OutputCollector (C) Reporter (D) All of the above e) None of these
136 is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer. (A) Partitioner (B) OutputCollector (C) Reporter (D) All of the above e) None of these
137 A serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication e) None of these
138 HDFS works in a fashion. (A)a)masterworker b) master-slave c) worker/slave d) All of the above e) None of these
139 NameNode is used when the Primary NameNode goes down. a) Rack b) Data c) Secondary d) None e)none of these
140 HDFS is implemented in programming language. ( a) C++ b) Java c) Scala d) None e) None of these
141 Hadoop developed by _______________ a)Larry Page b)Doug Cutting c)Mark d)Bill Gates e) None of these
Department of Information Technology
142.The MapReduce algorithm contains two important tasks, namely __________. a)mapped, reduce b)mapping, Reduction c) Map, Reduction d) Map, Reduce e)None of these
143.mapper and reducer classes extends classes from the package a) org.apache.hadd op.mapreduce b)apache.hadoop c)org.mapreduce d)hadoop.mapreduce e) None of these
144.HDFS inherited from ------------- file system. a)Yahoo b) FTFS c)Google d)Rediff e) none of these
145 NameNode is used when the Primary NameNode goes down. a) Rack b) Data c) Secondary d) Primary e) None of these
146 HDFS works in a fashion. a) master-worker b) master-slave c) worker/slave d) All of the above e) None of these
147: A serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication e) None of these
148 HDFS provides a command line interface called used to interact with HDFS. a) HDFS Shell b) FS Shell c) DFSA Shell d) NO shell e) None of these
Department of Information Technology
149 is the slave/worker node and holds the user data in the form of Data Blocks. a) DataNode b) NameNode c) Data lock d) Replication e) None of these
150. is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution. a) Map Parameters b) JobConf c) MemoryConf d) All of the above e) None of these
Department of Information Technology
Data Analytics KIT-601 Answer key
UNIT-1 UNIT-2 UNIT-3 UNIT-4 UNIT-5 1-b 31-b 61-b 91-d 121-a 2-d 32-b 62-b 92-b 122-b 3-d 33-b 63-d 93-d 123-d 4-b 34-c 64-b 94-c 124-a 5-b 35-a 65-d 95-a 125-b 6-c 36-b 66-a 96-c 126-a 7-c 37-b 67-a 97-a 127-c 8-c 38-a 68-b 98-a 128-a 9-d 39-d 69-b 99-a 129-d 10-a 40-d 70-a 100-b 130-c 11-c 41-d 71-b 101-d 131-b 12-a 42-b 72-b 102-b 132-d 13-b 43-a 73-e 103-a 133-c 14-a 44-a 74-d 104-a 134-a 15-b 45-b 75-a 105-b 135-c 16-c 46-d 76-b 106-a,d 136-b 17-d 47-b 77-a,b 107-a 137-b 18-b 48-a 78-d 108-b 138-a 19-a 49-c 79-b 109-a 139-c 20-c 50-b 80-d 110-a 140-b 21-a 51-a 81-b 111-a,b 141-b 22-d 52-d 82-b 112-a 142-d 23-d 53-a 83-c,d 113-a 143-a 24-d 54-d 84-a,b 114-b 144-c 25-a 55-c 85-a 115-c,d 145-c 26-b 56-d 86-b 116-c 146-b 27-a 57-b 87-a,b 117-a,b 147-b 28-d 58-b 88-c 118-b 148-b 29-a 59-d 89-a,d 119-d 149-a 30-b 60-b 90-c 120-a 150-b
***************Data Analytics MCQs Set - 1***************
1. The branch of statistics which deals with development of particular statistical methods
is classified as
1. industry statistics
2. economic statistics
3. applied statistics
4. applied statistics
Answer: applied statistics
2. Which of the following is true about regression analysis?
1. answering yes/no questions about the data
2. estimating numerical characteristics of the data
3. modeling relationships within the data
4. describing associations within the data
Answer: modeling relationships within the data
3. Text Analytics, also referred to as Text Mining?
1. True Join:- https://t.me/AKTU_Notes_Books_Quantum
2. False
3. Can be true or False
4. Can not say
Answer: True
4. What is a hypothesis?
1. A statement that the researcher wants to test through the data collected in a study.
2. A research question the results will answer.
3. A theory that underpins the study.
4. A statistical method for calculating the extent to which the results could have happened by
chance.
Answer: A statement that the researcher wants to test through the data collected in a study.
5. What is the cyclical process of collecting and analysing data during a single research
study called?
1. Interim Analysis
2. Inter analysis
3. inter item analysis
4. constant analysis
Answer: Interim Analysis
6. The process of quantifying data is referred to as ____ Join:- https://t.me/AKTU_Notes_Books_Quantum
1. Topology
2. Digramming
3. Enumeration
4. coding
Answer: Enumeration
7. An advantage of using computer programs for qualitative data is that they _
1. Can reduce time required to analyse data (i.e., after the data are transcribed)
2. Help in storing and organising data
3. Make many procedures available that are rarely done by hand due to time constraints
4. All of the above
Answer: All of the Above
8. Boolean operators are words that are used to create logical combinations.
1. True
2. False
Answer: True
9. ______ are the basic building blocks of qualitative data.
1. Categories Join:- https://t.me/AKTU_Notes_Books_Quantum
2. Units
3. Individuals
4. None of the above
Answer: Categories
10. This is the process of transforming qualitative research data from written interviews or field notes into typed text.
1. Segmenting
2. Coding
3. Transcription
4. Mnemoning
Answer: Transcription
11. A challenge of qualitative data analysis is that it often includes data that are unwieldy and complex; it is a major challenge to make sense of the large pool of data.
1. True
2. False
Answer: True
12. Hypothesis testing and estimation are both types of descriptive statistics.
1. True
2. False Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: False
13. A set of data organised in a participants(rows)-by-variables(columns) format is known as a “data set.”
1. True
2. False
Answer: True
14. A graph that uses vertical bars to represent data is called a ___
1. Line graph
2. Bar graph
3. Scatterplot
4. Vertical graph
Answer: Bar graph
15. ____ are used when you want to visually examine the relationship between two
quantitative variables.
1. Bar graph
2. pie graph
3. line graph
4. Scatterplot Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: Scatterplot
16. The denominator (bottom) of the z-score formula is
1. The standard deviation
2. The difference between a score and the mean
3. The range
4. The mean
Answer: The standard deviation
17. Which of these distributions is used for a testing hypothesis?
1. Normal Distribution
2. Chi-Squared Distribution
3. Gamma Distribution
4. Poisson Distribution
Answer: Chi-Squared Distribution
18. A statement made about a population for testing purpose is called?
1. Statistic
2. Hypothesis
3. Level of Significance
4. Test-Statistic Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: Hypothesis
19. If the assumed hypothesis is tested for rejection considering it to be true is called?
1. Null Hypothesis
2. Statistical Hypothesis
3. Simple Hypothesis
4. Composite Hypothesis
Answer: Null Hypothesis
20. If the null hypothesis is false then which of the following is accepted?
1. Null Hypothesis
2. Positive Hypothesis
3. Negative Hypothesis
4. Alternative Hypothesis.
Answer: Alternative Hypothesis.
21. Alternative Hypothesis is also called as?
1. Composite hypothesis
2. Research Hypothesis
3. Simple Hypothesis
4. Null Hypothesis Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: Research Hypothesis
******** Join:- https://t.me/AKTU_Notes_Books_Quantum ********
*************** Data Analytics MCQs Set – 2 ***************
1. What is the minimum no. of variables/ features required to perform clustering?
1.0
2.1
3.2
4.3
Answer: 1
2. For two runs of K-Mean clustering is it expected to get same clustering results?
1. Yes
2. No
Answer: No
3. Which of the following algorithm is most sensitive to outliers? Join:- https://t.me/AKTU_Notes_Books_Quantum
1. K-means clustering algorithm
2. K-medians clustering algorithm
3. K-modes clustering algorithm
4. K-medoids clustering algorithm
Answer: K-means clustering algorithm
4. The discrete variables and continuous variables are two types of
1. Open end classification
2. Time series classification
3. Qualitative classification
4. Quantitative classification
Answer: Quantitative classification
5. Bayesian classifiers is
1. A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.
2. Any mechanism employed by a learning system to constrain the search space of a hypothesis
3. An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation.
4. None of these
Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.
6. Classification accuracy is
1. A subdivision of a set of examples into a number of classes
2. Measure of the accuracy, of the classification of a concept that is given by a certain theory
3. The task of assigning a classification to a set of examples
4. None of these
Answer: Measure of the accuracy, of the classification of a concept that is given by a certain theory
7. Euclidean distance measure is
1. A stage of the KDD process in which new data is added to the existing selection.
2. The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them
3. The distance between two points as calculated using the Pythagoras theorem
4. none of above
Answer: The distance between two points as calculated using the Pythagoras theorem
8. Hybrid is
1. Combining different types of method or information Join:- https://t.me/AKTU_Notes_Books_Quantum
2. Approach to the design of learning algorithms that is structured along the lines of the theory of evolution.
3. Decision support systems that contain an information base filled with the knowledge of an expert formulated in terms of if-then rules.
4. none of above
Answer: Combining different types of method or information
9. Decision trees use , in that they always choose the option that seems the best available at that moment.
1. Greedy Algorithms
2. divide and conquer
3. Backtracking
4. Shortest path algorithm
Answer: Greedy Algorithms
10. Discovery is
1. It is hidden within a database and can only be recovered if one is given certain clues (an example IS encrypted information).
2. The process of executing implicit previously unknown and potentially useful information from data
3. An extremely complex molecule that occurs in human chromosomes and that carries genetic
information in the form of genes.
4. None of these
Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: The process of executing implicit previously unknown and potentially useful information from data
11. Hidden knowledge referred to
1. A set of databases from different vendors, possibly using different database paradigms
2. An approach to a problem that is not guaranteed to work but performs well in most cases
3. Information that is hidden in a database and that cannot be recovered by a simple SQL query.
4. None of these
Answer: Information that is hidden in a database and that cannot be recovered by a simple SQL query.
12. Decision trees cannot handle categorical attributes with many distinct values, such as country codes for telephone numbers.
1. True
2. False
Answer: False
15. CNMICHMENT IS
1. A stage of the KDD process in which new data is added to the existing selection
2. The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them
3. The distance between two points as calculated using the Pythagoras theorem.
4. None of these Join:- https://t.me/AKTU_Notes_Books_Quantum
Answer: A stage of the KDD process in which new data is added to the existing selection
14. are easy to implement and can execute efficiently even without prior knowledge of the data, they are among the most popular algorithms for classifying text documents.
1. 1D3
2. Naive Bayes classifiers
3. CART
4. None of above
Answer: Naive Bayes classifiers
15. High entropy means that the partitions in classification are
1. Pure
2. Not Pure
3. Usefull
4. useless
Answer: Uses a single processor or computer
16. Which of the following statements about Naive Bayes is incorrect?
1. Attributes are equally important.
2. Attributes are statistically dependent of one another given the class value.
3. Attributes are statistically independent of one another given the class value. Join:- https://t.me/AKTU_Notes_Books_Quantum
4. Attributes can be nominal or numeric
Answer: Attributes are statistically dependent of one another given the class value.
17. The maximum value for entropy depends on the number of classes so if we have 8 Classes what will be the max entropy.
1. Max Entropy is 1
2. Max Entropy is 2
3. Max Entropy is 3
4. Max Entropy is 4
Answer: Max Entropy is 3
18. Point out the wrong statement.
1. k-nearest neighbor is same as k-means
2. k-means clustering is a method of vector quantization
3. k-means clustering aims to partition n observations into k clusters
4. none of the mentioned
Answer: k-nearest neighbor is same as k-means
19. Consider the following example “How we can divide set of articles such that those articles have the same theme (we do not know the theme of the articles ahead of time) ” is this:
1. Clustering
2. Classification Join:- https://t.me/AKTU_Notes_Books_Quantum
3. Regression
4. None of these
Answer: Clustering
20. Can we use K Mean Clustering to identify the objects in video?
1. Yes
2. No
Answer: Yes
21. Clustering techniques are in the sense that the data scientist does not determine, in advance, the labels to apply to the clusters.
1. Unsupervised
2. supervised
3. Reinforcement
4, Neural network
Answer: Unsupervised
22. metric is examined to determine a reasonably optimal value of k.
1. Mean Square Error
2. Within Sum of Squares (WSS)
3. Speed Join:- https://t.me/AKTU_Notes_Books_Quantum
4. None of these
Answer: Within Sum of Squares (WSS)
23. If an itemset is considered frequent, then any subset of the frequent itemset must also be frequent.
1. Apriori Property
2. Downward Closure Property
3. Either 1 or 2
4. Both 1 and 2
Answer: Both 1 and 2Z
24. if {bread,eggs,milk} has a support of 0.15 and {bread,eggs} also has a support of 0.15, the confidence of rule {bread,eggs} = {milk} is
1.0
2.1
3.2
4.3
Answer: 1
25. Confidence is a measure of how X and Y are really related rather than coincidentally happeningtogether.
1. True Join:- https://t.me/AKTU_Notes_Books_Quantum
2. False
Answer: False
26. recommend items based on similarity measures between users and/or items.
1. Content Based Systems
2. Hybrid System
3. Collaborative Filtering Systems
4. None of these
Answer: Collaborative Filtering Systems
27. There are major Classification of Collaborative Filtering Mechanisms
1.1
2.2
3.3
4. none of above
Answer: 2
28. Movie Recommendation to people is an example of
1. User Based Recommendation
2. Item Based Recommendation
3. Knowledge Based Recommendation Join:- https://t.me/AKTU_Notes_Books_Quantum
4. content based recommendation
Answer: Item Based Recommendation
29. recommenders rely on an explicitely defined set of recommendation rules
1. Constraint Based
2. Case Based
3. Content Based
4. User Based
Answer: Case Based
30. Parallelized hybrid recommender systems operate dependently of one another and produce separate recommendation lists.
1. True
2. False
Answer: False
Department of IT
COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 1 DataAnalytics(KIT601)
1. The data with no pre-defined organizational form or specific format is
a. Semi-structured data b. Unstructured data c. Structured data d. None of these
Ans. b
a. Categorical data b. Interval data c. Ordinal data d. Ratio data
Ans. c
Ans. c
2. The data which can be ordered or ranked according to some relationship to one another is
3. Predict the future by examining historical data, detecting patterns or relationships in these data, and then extrapolating these relationships forward in time. a. Prescriptive model b. Descriptive model c. Predictive model d. None of these
Ans. b
Ans. a
Ans. c
Ans. d
4. Person responsible for the genesis of the project, providing the impetus for the project and core business problem, generally provides the funding and will gauge the degree of value from the final outputs of the working team is a. Business User b. Project Sponsor c. Business Intelligence Analyst d. Data Engineer
5. Deep technical skills to assist with tuning SQL queries for data management, extraction and support data ingest to analytic sandbox is handled by ___________. a. Data Engineer b. Business User c. Project Sponsor d. Business Intelligence Analyst
6. Business domain expertise with deep understanding of the data, KPIs, key metrics and business intelligence from a reporting perspective is key role of ____________.
a. Business User b. Project Sponsor c. Business Intelligence Analyst d. Data Engineer
7. _____________ is concerned with uncertainty or inaccuracy of the data.
a. Volume b. Velocity c. Variety d. Veracity
Ans. d
Ans. d
Ans. True
11. The process of exploring data and reports in order to extract meaningful insights, which can be used to better understand and improve business performance.
a. Reporting b. Analysis c. Summarizing d. None of these
Ans. b
Ans. a
8. What are the V’s in the characteristics of Big data? a. Volume b. Velocity c. Variety d. All of these
9. What are the types of reporting in data analytics?
a. Canned reports b. Dashboard reports c. Alert reports d. All of above
10.Massive Parallel Processing (MPP) database breaks the data into independent chunks with independent disk and CPU resources.
a. True b. False
12. The key components of an analytical sandbox are: (i) Business analytics (ii) Analytical sandbox platform (iii) Data access and delivery (iv) Data sources
a. True b. False
Ans. b
14. Which phase Prepare an analytic sandbox, in which you can work for the duration of the project. Perform ELT and ETL to get data into the sandbox, and begin transforming the data so you can work with it and analyze it. Familiarize yourself with the data thoroughly and take steps to condition the data.
a. Data preparation b. Discovery c. Data Modelling d. Data Building Ans. a
Ans.b
Ans. a
13. The ____________phase learn the business domain, including relevant history, such as whether the organization or business unit has attempted similar projects in the past, from which you can learn. Assess the resources you will have to support the project, in terms of people, technology, time, and data. Frame the business problem as an analytic challenge that can be addressed in subsequent phases. Formulate initial hypotheses (IH) to test and begin learning the data. a. Data preparation b. Discovery c. Data Modelling d. Data Building
15. Which phase uses SQL, Python, R, or excel to perform various data modifications and transformations.
a. Data preparation b. Data cleaning c. Data Modelling d. Data Building
16. By definition, Database Administrator is a person who ___________
a. Provisions and configures database environment to support the analytical needs of the working team. b. Ensure key milestones and objectives are met on time and at expected quality. c. Deep technical skills to assist with tuning SQL queries for data management, extraction and support data ingest to analytic sandbox. d. None of these
Ans. a
Ans. c
Ans. b
Ans .b
17. ETL stands for
a. Extract, Load, Transform b. Evaluate, Transform ,Load c. Extract , Loss , Transform d. None of the above
18. The phase Develop data sets for testing, training, and production purposes. Get the best environment you can for executing models and workflows, including fast hardware and parallel processing is referred to as
a. Data preparation b. Discovery c. Data Modelling d. Data Building
19. Which of the following is not a major data analysis approaches?
a. Data Mining b. Predictive Intelligence c. Business Intelligence d. Text Analytics
20. User rating given to a movie in a scale 1-10, can be considered as an attribute of type?
a. Nominal b. Ordinal c. Interval d. Ratio
Ans. d
22. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.
a. TRUE b. FALSE c. Can be true or false d. Cannot say
Ans. a
Ans. b
Ans.b
25. The Process of describing the data that is huge and complex to store and process is known as
a. Analytics b. Data mining c. Big Data d. Data Warehouse
21. Data Analysis is defined by the statistician?
a. William S. b. Hans Peter Luhn c. Gregory Piatetsky-Shapiro d. John Tukey
23. Which of the following is not a major data analysis approaches?
a. Data Mining b. Predictive Intelligence c. Business Intelligence d. Text Analytics
24. Which of the following step is performed by data scientist after acquiring the data?
a. Data Cleansing b. Data Integration c. Data Replication d. All of the mentioned
Ans. c
26. Data generated from online transactions is one of the example for volume of big data. Is this true or False. a. TRUE b. FALSE
Ans. a
27. Velocity is the speed at which the data is processed
a. TRUE b. FALSE
Ans. b
28. _____________ have a structure but cannot be stored in a database.
a. Structured b. Semi-Structured c. Unstructured d. None of these
Ans. b
29. ____________refers to the ability to turn your data useful for business.
a. Velocity b. Variety c. Value d. Volume
Ans. c
30. Value tells the trustworthiness of data in terms of quality and accuracy.
a. TRUE b. FALSE
Ans.b
NPTEL Questions
31. Analysing the data to answer why some phenomenon related to learning happened is a type of
a. Descriptive Analytics b. Diagnostic Analytics
c. Predictive Analytics d. Prescriptive Analytics
Ans. B
32. Analysing the data to answer what will happen next is a type of
a. Descriptive Analytics b. Diagnostic Analytics c. Predictive Analytics d. Prescriptive Analytics
Ans. D
33. Learning analytics at institutions/University, regional or national level is termed as
a. Educational data mining b. Business intelligence c. Academic analytics d. None of the above
Ans. C
34. Which of the following questions is not a type of Predictive Analytics?
a. What is the average score of all students in the CBSE 10th Maths Exam? b. What will be the performance of a students in next questions? c. Which courses will the student take in the next semester? d. What is the average attendance of the class over the semester
Ans A,D
35. A courses instructor has data about students attendance in her course in the past semester. Based on this data, she constructs a line graph type of analytics is she doing?
a. Descriptive Analytics b. Diagnostic Analytics c. Predictive Analytics d. Prescriptive Analytics
Ans. A
36. she then correlates the attendance with their final exam scores. She realizes that students who score 90% and above also have an attandence of more then 75%. What type of analytics is she doing?
a. Descriptive Analytics b. Diagnostic Analytics c. Predictive Analytics d. Prescriptive Analytics
Ans. B
38. Why one should not go for sampling?
a. Less costly to administer than a census. b. The person authorizing the study is comfortable with the sample. c. Because the research process is sometimes destructive d. None of the above
Ans. d
39. Stratified random sampling is a method of selecting a sample in which:
a. the sample is first divided into strata, and then random samples are taken from each stratum b. various strata are selected from the sample c. the population is first divided into strata, and then random samples are drawn from each stratum d. None of these alternatives is correct.
Ans. c
SET II
1. Data Analysis is defined by the statistician?
e. William S. f. Hans Peter Luhn g. Gregory Piatetsky-Shapiro h. John Tukey
Ans D
2. What is classification?
a) deciding what features to use in a pattern recognition problem b) deciding what class an input pattern belongs to c) deciding what type of neural network to use d) none of the mentioned
Ans. B
3. Data in ___________ bytes size is called Big Data.
A. Tera B. Giga C. Peta D. Meta
Ans : C
Explanation: data in Peta bytes i.e. 10^15 byte size is called Big Data. 4. How many V's of Big Data
A. 2 B. 3 C. 4 D. 5
Ans : D
Explanation: Big Data was defined by the “3Vs” but now there are “5Vs” of Big Data which are Volume, Velocity, Variety, Veracity, Value
5. Transaction data of the bank is?
A. structured data B. unstructured datat C. Both A and B D. None of the above
Ans : A
Explanation: Data which can be saved in tables are structured data like the transaction data of the bank. 6. In how many forms BigData could be found?
A. 2 B. 3 C. 4 D. 5
Ans : B
Explanation: BigData could be found in three forms: Structured, Unstructured and Semistructured. 7. Which of the following are Benefits of Big Data Processing?
A. Businesses can utilize outside intelligence while taking decisions B. Improved customer service C. Better operational efficiency D. All of the above
Ans : D
Explanation: All of the above are Benefits of Big Data Processing.
8. Which of the following are incorrect Big Data Technologies?
A. Apache Hadoop B. Apache Spark C. Apache Kafka D. Apache Pytarch
Ans : D
Explanation: Apache Pytarch is incorrect Big Data Technologies. 9. The overall percentage of the world’s total data has been created just within the past two years is ?
A. 80% B. 85% C. 90% D. 95%
Ans : C
Explanation: The overall percentage of the world’s total data has been created just within the past two years is 90%. 10. Apache Kafka is an open-source platform that was created by?
A. LinkedIn B. Facebook
C. Google D. IBM
Ans : A
Explanation: Apache Kafka is an open-source platform that was created by LinkedIn in the year 2011.
11. What was Hadoop named after?
A. Creator Doug Cutting’s favorite circus act B. Cuttings high school rock band C. The toy elephant of Cutting’s son D. A sound Cutting’s laptop made during Hadoop development
Ans : C
Explanation: Doug Cutting, Hadoop creator, named the framework after his child’s stuffed toy elephant. 12. What are the main components of Big Data?
A. MapReduce B. HDFS C. YARN D. All of the above
Ans : D
Explanation: All of the above are the main components of Big Data.
13. Point out the correct statement.
A. Hadoop do need specialized hardware to process the data B. Hadoop 2.0 allows live stream processing of real time data C. In Hadoop programming framework output files are divided into lines or records D. None of the above
Ans : B
Explanation: Hadoop batch processes data distributed over a number of computers ranging in 100s and 1000s. 14. Which of the following fields come under the umbrella of Big Data?
A. Black Box Data B. Power Grid Data
C. Search Engine Data D. All of the above
Ans : D
Explanation: All options are the fields come under the umbrella of Big Data.
15. Which of the following is not an example of Social Media? 1. Twitter 2. Google 3. Instagram 4. Youtube
ANs: 2 (Google)
16. By 2025, the volume of digital data will increase to 1. TB 2. YB 3. ZB 4. EB Ans: 3 ZB
17. Data Analysis is a process of 1. inspecting data 2. cleaning data 3. transforming data 4. All of Above
Ans. 4 All of above
18. Which of the following is not a major data analysis approaches? 1. Data Mining 2. Predictive Intelligence 3. Business Intelligence 4. Text Analytics
Ans. 2 Predictive Intelligence
19. The Process of describing the data that is huge and complex to store and process is known as 1. Analytics 2. Data mining 3. Big data 4. Data warehouse
Ans. 3 Big data
20. In descriptive statistics, data from the entire population or a sample is summarized with ?
1. Integer descriptor 2. floating descriptor 3. numerical descriptor 4. decimal descriptor
Ans. 3 numerical descriptor
21. Data generated from online transactions is one of the example for volume of big data 1. TRUE 2. FALSE
TRUE
22. Velocity is the speed at which the data is processed 1. True 2. False
False
23. Value tells the trustworthiness of data in terms of quality and accuracy 1. TRUE 2. FALSE
False
24. Hortonworks was introduced by Cloudera and owned by Yahoo 1. True 2. False
False
25. ____ refers to the ability to turn your data useful for business 1. Velocity 2. variety 3. Value 4. Volume
Ans. 3 Value
26. Data Analysis is defined by the statistician? 1. William S. 2. Hans Peter Luhn 3. Gregory Piatetsky-Shapiro 4. John Tukey
Ans. 4 John Tukey
27. Files are divided into ____ sized Chunks. 1. Static 2. Dynamic 3. Fixed 4. Variable
Ans. 3 Fixed
28. _____ is an open source framework for storing data and running application on clusters of commodity hardware. 1. HDFS 2. Hadoop 3. MapReduce 4. Cloud
Ans. 2 Hadoop
29. ____ is factors considered before Adopting Big Data Technology 1. Validation 2. Verification 3. Data 4. Design
Ans. 1 Validation
30. Which among the following is not a Data mining and analytical applications? 1. profile matching 2. social network analysis 3. facial recognition 4. Filtering
Ans. 4 Filtering
31. Which storage subsystem can support massive data volumes of increasing size. 1. Extensibility 2. Fault tolerance 3. Scalability 4. High-speed I/O capacity
Ans. 3 Scalability
32. ______ is a programming model for writing applications that can process Big Data in parallel on multiple nodes.
1. HDFS 2. MAP REDUCE 3. HADOOP 4. HIVE Ans. MAP REDUCE
33. How many main statistical methodologies are used in data analysis?
A. 2 B. 3 C. 4 D. 5
Ans : A
Explanation: In data analysis, two main statistical methodologies are used Descriptive statistics and Inferential statistics.
34. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.
A. TRUE B. FALSE C. Can be true or false D. Can not say
Ans : A
Explanation: The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.
35. The branch of statistics which deals with development of particular statistical methods is classified as 1. industry statistics 2. economic statistics 3. applied statistics 4. applied statistics
Ans. applied statistics
36. Point out the correct statement. a) Descriptive analysis is first kind of data analysis performed b) Descriptions can be generalized without statistical modelling
c) Description and Interpretation are same in descriptive analysis d) None of the mentioned
Answer: b Explanation: Descriptive analysis describe a set of data.
37. What are the five V’s of Big Data?
A. Volume
B. Velocity
C. Variety
D. All the above
Answer: Option D
38. What are the main components of Big Data?
A. MapReduce
B. HDFS
C. YARN
D. All of these
Answer: Option D
39. What are the different features of Big Data Analytics?
A. Open-Source
B. Scalability
C. Data Recovery
D. All the above
Answer: Option D
40. Which of the following refers to the problem of finding abstracted patterns (or structures) in the unlabeled data?
A. Supervised learning
B. Unsupervised learning
C. Hybrid learning
D. Reinforcement learning
Answer: B
Explanation: Unsupervised learning is a type of machine learning algorithm that is generally used to find the hidden structured and patterns in the given unlabeled data.
41. Which one of the following refers to querying the unstructured textual data?
A. Information access
B. Information update
C. Information retrieval
D. Information manipulation
Answer: D
Explanation: Information retrieval refers to querying the unstructured textual data. We can also understand information retrieval as an activity (or process) in which the tasks of obtaining information from system recourses that are relevant to the information required from the huge source of information.
42. For what purpose, the analysis tools pre-compute the summaries of the huge amount of data?
A. In order to maintain consistency
B. For authentication
C. For data access
D. To obtain the queries response
Answer: d
Explanation: Whenever a query is fired, the response of the query would be put very earlier. So, for the query response, the analysis tools pre-compute the summaries of the huge amount of data. To understand it in more details, consider the following example:
43. Which one of the following statements is not correct about the data cleaning?
It refers to the process of data cleaning
It refers to the transformation of wrong data into correct data
It refers to correcting inconsistent data
All of the above
Answer: d
Explanation: Data cleaning is a kind of process that is applied to data set to remove the noise from the data (or noisy data), inconsistent data from the given data. It also involves the process of transformation where wrong data is transformed into the correct data as well. In other words, we can also say that data cleaning is a kind of pre-process in which the given set of data is prepared for the data warehouse.
44. Any data with unknown form or the structure is classified as _ data. a. Structured b. Unstructured c. Semi-structured d. None of above Ans. b
45.____ means relating to the issuing of reports. a. Analysis b. Reporting c. Reporting and Analysis d. None of the above
Ans. b
46.Veracity involves the reliability of the data this is ________due to the numerous data sources of big data a) Easy and difficulty b) Easiness c) Demanding d) none of these
Ans. c 47. ____is a process of defining the measurement of a phenomenon that is not directly measurable, though its existence is implied by other phenomena. a. Data preparation b. Model planning c. Communicating results d. Operationalization
Ans. d
48. _____data is data whose elements are addressable for effective analysis.
a. Structured b. Semi-structured c. Unstructured d. None of the above
Ans. a
49. ______data is information that does not reside in a relational database but that have some organizational properties that make it easier to analyze.
a. Structured b. Semi-structured c. Unstructured d. None of the above
Ans. b
50. ______data is a data which is not organized in a predefined manner or does not have a predefined data model, thus it is not a good fit for a mainstream relational database.
a. Structured b. Semi-structured c. Unstructured d. None of the above
Ans. c
51. There are ___ types of big data.
a. 2 b. 3 c. 4 d. 5
Ans. b
52. Google search is an example of _________ data.
a. Structured b. Semi-structured c. Unstructured d. None of the above
Ans. c
Department of IT
COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 2 DataAnalytics(KIT601)
1. Maximum aposteriori classifier is also known as: a. Decision tree classifier b. Bayes classifier c. Gaussian classifier d. Maximum margin classifier
Ans. B
2. Which of the following sentence is FALSE regarding regression?
a. It relates inputs to outputs. b. It is used for prediction. c. It may be used for interpretation. d. It discovers causal relationships.
Ans. d
3. Suppose you are working on stock market prediction, and you would like to predict the price of a particular stock tomorrow (measured in dollars).
You want to use a learning algorithm for this.
a. Regression b. Classification c. Clustering d. None of these
Ans. a
4. In binary logistic regression:
a. The dependent variable is divided into two equal subcategories. b. The dependent variable consists of two categories. c. There is no dependent variable. d. The dependent variable is continuous.
Ans. b
5. A fair six-sided die is rolled twice. What is the probability of getting 4 on the first roll and not getting 6 on the second roll?
a. 1/36 b. 5/36 c. 1/12 d. 1/9
Ans. b
6. The parameter β0 is termed as intercept term and the parameter β1 is termed as slope parameter. These parameters are usually called as _________
a Regressionists b. Coefficients c. Regressive d. Regression coefficients
Ans. d
7. ________ is a simple approach to supervised learning. It assumes that the dependence of Y on X1, X2… Xp is linear.
a. Gradient Descent b. Linear regression
c. Logistic regression d. Greedy algorithms
Ans. c
8. What makes the interpretation of conditional effects extra challenging in logistic regression?
a. It is not possible to model interaction effects in logistic regression b. The maximum likelihood estimation makes the results unstable c. The conditional effect is dependent on the values of all X-variables d. The results has to be raised by its natural logarithm.
Ans. c 9. If there were a perfect positive correlation between two interval/ratio variables, the Pearson's r test would give a correlation coefficient of:
a. - 0.328 b. +1 c. +0.328 d. – 1
Ans.b
10. Logistic Regression transforms the output probability to in a range of [0, 1]. Which of the following function is used for this purpose?
a. Sigmoid b. Mode c. Square d. All of these
Ans.a
12. Generally which of the following method(s) is used for predicting continuous dependent variable?
1. Linear Regression 2. Logistic Regression
a. 1 and 2
b. only 1 c. only 2 d. None of these
Ans.b
13. Mean of the set of numbers {1, 2, 3, 4, 5} is?
a. 2 b. 3 c. 4 d. 5
Ans.b
14. Name of a movie, can be considered as an attribute of type?
a. Nominal
b. Ordinal
c. Interval
d. Ratio
Ans.a
15. Let A be an example, and C be a class. The probability P(C) is known as:
a. Apriori probability
b. Aposteriori probability
c. Class conditional probability
d. None of the above
Ans.a
16. Consider two binary attributes X and Y. We know that the attributes are independent and probability P(X=1) = 0.6, and P(Y=0) = 0.4. What is the probability that both X and Y have values 1?
a. 0. 0.06 b. 0.16 c. 0.26 d. 0.36
Ans. d
17. In regression the output is a. Discrete b. Continuous c. Continuous and always lie in same range d. May be discrete and continuous
Ans. b
18. The probabilistic model that finds the most probable prediction using the training data and space of hypotheses to make a prediction for a new data instance.
a. Concept learning b. Bayes optimal classifier c. EM algorithm d. Logistic regression
Ans. b
19 . State whether the following condition is true or not. “In Bayesian theorem , it is important to find the probability of both the events occurring simultaneously”
a. True b. False
Ans. b 20 .If the correlation coefficient is a positive value, then the slope of the regression line
a. can be either negative or positive
b. must also be positive c. can be zero d. cannot be zero
Ans. b
21. Which of the following is true about Naive Bayes?
a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent c. Assumes that all the features in a dataset are equally important and are independent. d. None of the above options
Ans. c
22. Previous probabilities in Bayes Theorem that are changed with help of new available information are classified as _______
a. independent probabilities b. posterior probabilities c. interior probabilities d. dependent probabilities
Ans. b
23. Which of the following methods do we use, to find the best fit line for data in Linear Regression?
a. Least Square Error b. Maximum Likelihood c. Logarithmic Loss d. Both A and B
Ans. a
24. What is the consequence between a node and its predecessors while creating Bayesian network?
a. Conditionally dependent b. Dependent c. Conditionally independent d. Both a & b
Ans. c 25. Bayes rule can be used to __________conditioned on one piece of evidence.
a. Solve queries b. Answer probabilistic queries c. Decrease complexity of queries d. Increase complexity of queries
Ans.b
26. Which of the following options is/are correct in reference to Bayesian Learning?
a. New instances can be classified by combining the predictions of multiple hypotheses, weighted by their probabilities. b. Bayesian methods can accommodate hypotheses that make probabilistic predictions. c. Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. d. All of the mentioned
Ans. d
27. When the cell is said to be fired? a. if potential of body reaches a steady threshold values b. if there is impulse reaction c. during upbeat of heart d. none of the mentioned
Ans.a 28. Which of the following is true about regression analysis?
a. answering yes/no questions about the data b. estimating numerical characteristics of the data c. modeling relationships within the data d. describing associations within the data
Ans.c
29. Suppose you are building a SVM model on data X. The data X can be error prone which means that you should not trust any specific data point too much. Now think that you want to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of its hyper parameter. Based upon that give the answer for following question. What would happen when you use very large value of C?
a. We can still classify data correctly for given setting of hyper parameter C b. We cannot classify data correctly for given setting of hyper parameter C. c. Can’t Say
d. None of these
Ans. a
30. What is/are true about kernel in SVM?
(a) Kernel function map low dimensional data to high dimensional space (b) It’s a similarity function
a. Kernel function map low dimensional data to high dimensional space b. It’s a similarity function c. Kernel function map low dimensional data to high dimensional space and It’s a similarity function d. None of these
Ans. c
31. Suppose you have trained an SVM with linear decision boundary after training SVM, you correctly infer that your SVM model is under fitting. (a) Which of the following option would you more likely to consider iterating SVM next time? Tasks a. You want to increase your data points. b. You want to decrease your data points. c. You will try to calculate more variables. d. You will try to reduce the features.
Ans. c
32. Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?
a. The model would consider even far away points from hyperplane for modeling b. The model would consider only the points close to the hyperplane for modeling. c. The model would not be affected by distance of points from hyperplane for modeling. d. None of these
Ans.b
33. Which of the following can only be used when training data are linearly separable?
a. Linear Logistic Regression. b. Linear Soft margin SVM c. Linear hard-margin SVM d. Parzen windows.
Ans.c
34. Using the kernel trick, one can get non-linear decision boundaries using algorithms designed originally for linear models.
a. True b. False
Ans. a
35. Support vectors are the data points that lie closest to the decision surface.
a. True b. False
Ans. True
36. Which of the following statement is true for a multilayered perceptron?
a. Output of all the nodes of a layer is input to all the nodes of the next layer b. Output of all the nodes of a layer is input to all the nodes of the same layer c. Output of all the nodes of a layer is input to all the nodes of the previous layer d. Output of all the nodes of a layer is input to all the nodes of the output layer
Ans. a
37. Which of the following is/are true regarding an SVM?
a. For two dimensional data points, the separating hyperplane learnt by a linear SVM will be a straight line. b. In theory, a Gaussian kernel SVM cannot model any complex separating hyperplane. c. For every kernel function used in a SVM, one can obtain an equivalent closed form basis expansion. d. Overfitting in an SVM is not a function of number of support vectors.
Ans. a
38. The function of distance that is used to determine the weight of each training example in instance based learning is known as______________
a. Kernel Function b. Linear Function c. Binomial distribution d. All of the above
Ans. a 39. What is the name of the function in the following statement “A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it outputs a 1, otherwise it just outputs a 0”?
a. Step function b. Heaviside function c. Logistic function d. Binary function
Ans. b
40. Which of the following is true? (i) On average, neural networks have higher computational rates than conventional computers. (ii) Neural networks learn by example. (iii) Neural networks mimic the way the human brain works.
a. All of the mentioned are true b. (ii) and (iii) are true c. (i) and (ii) are true d. Only (i) is true
Ans. a
41. Which of the following is an application of NN (Neural Network)?
a. Sales forecasting b. Data validation c. Risk management d. All of the mentioned
Ans. d
42. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it outputs a 1, otherwise it just outputs a 0.
a. True b. False
Ans. a
43. In what ways can output be determined from activation value in ANN?
a. Deterministically
b. Stochastically c. both deterministically & stochastically d. none of the mentioned
Ans. c
45. In ANN, the amount of output of one unit received by another unit depends on what?
a. output unit b. input unit c. activation value d. weight
Ans. d
46. Function of dendrites in ANN is
a. receptors b. transmitter c. both receptor & transmitter d. none of the mentioned
Ans. a
47. Which of the following is true? (i) On average, neural networks have higher computational rates than conventional computers. (ii) Neural networks learn by example. (iii) Neural networks mimic the way the human brain works.
a. All of the mentioned are true b. (ii) and (iii) are true c. (i), (ii) and (iii) are true d. Only (i) is true
Ans. a 48. What is the name of the function in the following statement “A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it outputs a 1, otherwise it just outputs a 0”?
a. Step function b. Heaviside function
c. Logistic function d. Binary function
Ans. b
49. 4 input neuron has weight 1, 2, 3 and 4. The transfer function is linear with the constant of proportionality being equal to 2. The inputs are 4,10,5 and 20 respectively. The output will be
a. 238 b. 76 c. 119 d. 123
Ans. a
50. Which of the following are real world applications of the SVM?
a. Text and Hypertext Categorization b. Image Classification c. Clustering of News Articles d. All of the above
Ans.d
51. Support vector machine may be termed as:
a. Maximum apriori classifier
b. Maximum margin classifier
c. Minimum apriori classifier
d. Minimum margin classifier
Ans.b
52. What is purpose of Axon? a. receptors b. transmitter c. transmission d. none of the mentioned
53. The model developed from sample data having the form of Å· = b0 + b1X is known as: Ans: - C – estimated regression equation
54. In regression analysis, which of the following is not a required assumption about the error term ε?
Ans: - A – The expected value of the error term is one
55. ____________ are algorithms that learn from their more complex environments (hence eco) to generalize, approximate and simplify solution logic.
a. Fuzzy Relational DB
b. Ecorithms
c. Fuzzy Set
d. None of the mentioned
Ans. c
56. The truth values of traditional set theory is ____________ and that of fuzzy set is __________
a. Either 0 or 1, between 0 & 1
b. Between 0 & 1, either 0 or 1
c. Between 0 & 1, between 0 & 1
d. Either 0 or 1, either 0 or 1
Ans. a
57. What is the form of Fuzzy logic?
a. Two-valued logic
b. Crisp set logic
Ans.c
c. Many-valued logic
d. Binary set logic
Ans. c
58. Fuzzy logic is usually represented as ___________
a. IF-THEN rules
b. IF-THEN-ELSE rules
c. Both IF-THEN-ELSE rules & IF-THEN rules
d. None of the mentioned
Ans. a
59. ______________ is/are the way/s to represent uncertainty.
a. Fuzzy Logic
b. Probability
c. Entropy
d. All of the mentioned
Ans.d
60. Fuzzy Set theory defines fuzzy operators. Choose the fuzzy operators from the following.
a. AND
b. OR
c. NOT
d. All of mentioned
Ans. d
61. The values of the set membership is represented by ___________
a. Discrete Set
b. Degree of truth
c. Probabilities
d. Both Degree of truth & Probabilities View Answer
Ans. b
62. Fuzzy logic is extension of Crisp set with an extension of handling the concept of Partial Truth.
a. True
b. False
Ans. a
SET II
1. Sentiment Analysis is an example of 1. Regression 2. Classification 3. clustering 4. Reinforcement Learning
1. 1, 2 and 4 2. 1, 2 and 3 3. 1 and 3 4. 1 and 2 Show Answer Ans. 1, 2 and 4
2. The self-organizing maps can also be considered as the instance of _________ type of learning.
A. Supervised learning B. Unsupervised learning C. Missing data imputation D. Both A & C
Answer: B Explanation: The Self Organizing Map (SOM), or the Self Organizing Feature Map is a kind of Artificial Neural Network which is trained through unsupervised learning.
3. The following given statement can be considered as the examples of_________
Suppose one wants to predict the number of newborns according to the size of storks' population by performing supervised learning
A. Structural equation modeling B. Clustering C. Regression D. Classification
Answer: C
Explanation: The above-given statement can be considered as an example of regression. Therefore the correct answer is C.
4. In the example predicting the number of newborns, the final number of total newborns can be considered as the _________
A. Features B. Observation C. Attribute
D. Outcome
a. Answer: d b. Explanation: In the example of predicting the total number of newborns, the result will be represented as the outcome. Therefore, the total number of newborns will be found in the outcome or addressed by the outcome.
5. Which of the following statement is true about the classification?
A. It is a measure of accuracy B. It is a subdivision of a set C. It is the task of assigning a classification D. None of the above
Answer: B
Explanation: The term "classification" refers to the classification of the given data into certain sub-classes or groups according to their similarities or on the basis of the specific given set of rules.
6. Which one of the following correctly refers to the task of the classification?
A. A measure of the accuracy, of the classification of a concept that is given by a certain theory B. The task of assigning a classification to a set of examples C. A subdivision of a set of examples into a number of classes D. None of the above
Answer: B
Explanation: The task of classification refers to dividing the set into subsets or in the numbers of the classes. Therefore the correct answer is C.
7. _____is an observation which contains either very low value or very high value in comparison to other observed values. It may hamper the result, so it should be avoided. a. Dependent Variable b. Independent Variable c. Outlier Variable d. None of the above Ans. c
8. _______is a type of regression which models the non-linear dataset using a linear model.
a. Polynomial Regression b. Logistic Regression c. Linear Regression d. Decision Tree Regression
Ans. a
9. The prediction of the weight of a person when his height is known, is a simple example of regression. The function used in R language is_____.
a. Im() b. print() c. predict() d. summary( )
Ans. c
10. There is the following syntax of lm() function in multiple regression.
lm(y ~ x1+x2+x3...., data) a. y is predictor and x1,x2,x3 are the dependent variables. b. y is dependent and x1,x2,x3 are the predictors. c. data is predictor variable. d. None of the above.
Ans. b
11. _______is a probabilistic graphical model which represents a set of variables and their conditional dependencies using a directed acyclic graph.
a. A Bayesian network b. Bayes Network c. Bayesian Model d. All of the above
Ans. d
12. In support vector regression, _____is a function used to map lower dimensional data into higher dimensional data
A) Boundary line B) Kernel C) Hyper Plane D) Support Vector Ans. B
13. If the independent variables are highly correlated with each other than other variables then such condition is called___________ a) outlier b) Multicollinearity c) under fitting d) independent variable
Ans. b
14. The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a ____ or_____.
a. Directed Acyclic Graph or DAG b. Directed Cyclic Graph or DCG. c. Both the above. d. None of the above.
Ans. a
15. The hyperplane with maximum margin is called the ______ hyperplane. a. Non-optimal b. Optimal c. None of the above d. Requires one more option
Ans. b
16. One more _____ is needed for non-linear SVM.
a. Dimension b. Attribute c. Both the above d. None of the above
Ans. a
17. A subset of dataset to train the machine learning model, and we already know the output.
a. Training set b. Test set c. Both the above
d. None of the above
Ans. a
18. ______is the final step of data preprocessing in machine learning. It is a technique to standardize the independent variables of the dataset in a specific range. In_____, we put our variables in the same range and in the same scale so that no any variable dominate the other variable
a. Feature Sampling b. Feature Scaling c. None of the above d. Both the above
Ans. b
19. Principal components analysis (PCA) is a statistical technique that allows identifying underlying linear patterns in a data set so it can be expressed in terms of other data set of a significantly ____ dimension without much loss of information. a. Lower b. Higher c. Equal d. None of the above
Ans. a
20. _____ units which are internal to the network and do not directly interact with the environment. a. Input b. Output c. Hidden d. None of the above
Ans. c
21. In a ____ network there is an ordering imposed on the nodes in a network: if there is a connection from unit a to unit b then there can-not be a connection from b to a. a. Feedback b. Feed-Forward c. None of the above
Ans. b
22. _____ contains the multiple logical values and these values are the truth values of a variable or problem between 0 and 1. This concept was introduced by Lofti Zadeh in 1965 a. Boolean Logic b. Fuzzy Logic c. None of the above
Ans. b
23. ______is a module or component, which takes the fuzzy set inputs generated by the Inference Engine, and then transforms them into a crisp value. a. Fuzzification b. Defuzzification c. Inference Engine d. None of the above
Ans. b
24. The most common application of time series analysis is forecasting future values of a numeric value using the ______ structure of the ____ a. Shares,data b. Temporal,data c. Permanent,data d. None of these
Ans. b
25. Identify the component of a time series a. Temporal b. Shares c. Trend d. Policymakers
Ans. c
26. Predictable pattern that recurs or repeats over regular intervals. Seasonality is often observed within a year or less: This define the term__________ a. Trend b. Seasonality c. Cycles d. Recession
Ans. b
27. ________Learning uses a training set that consists of a set of pattern pairs: an input pattern and the corresponding desired (or target) output pattern. The desired output may be regarded as the ‘network’s ‘teacher” for that input a. Unsupervised b. Supervised c. Modular d. Object
Ans. b
28. The _______ perceptron consists of a set of input units connected by a single layer of weights to a set of output units a. Multi layer b. Single layer c. Hidden layer d. None of these
Ans. b
29. If we add another layer of weights to single layer perceptron , then we find that there is a new set of units that are neither input or output units; for simplicity we consider more than 2 layers has a. Single layer perceptron b. Multi layer perceptron c. Hidden layer d. None of these
Ans. b
30. Patterns that repeat over a certain period of time a. Seasonal b. Trend c. None of the above d. Both of the above
Ans. a
31. Which of the following is characteristic of best machine learning method ?
a. Fast b. Accuracy c. Scalable d. All of the Mentioned
Ans. d
32. Supervised learning differs from unsupervised clustering in that supervised learning requires a. at least one input attribute. b. input attributes to be categorical. c. at least one output attribute. d. ouput attriubutes to be categorical. Ans. d
33. Supervised learning and unsupervised clustering both require at least one a. hidden attribute. b. output attribute. c. input attribute. d. categorical attribute. Ans. c
34. Which statement is true about prediction problems? a. The output attribute must be categorical. b. The output attribute must be numeric. c. The resultant model is designed to determine future outcomes. d. The resultant model is designed to classify current behavior. Ans. c
35. Which statement is true about neural network and linear regression models? a. Both models require input attributes to be numeric. b. Both models require numeric attributes to range between 0 and 1. c. The output of both models is a categorical attribute value. d. Both techniques build models whose output is determined by a linear sum of weighted input attribute values. Ans. a
36. A feed-forward neural network is said to be fully connected when a. all nodes are connected to each other. b. all nodes at the same layer are connected to each other. c. all nodes at one layer are connected to all nodes in the next higher layer. d. all hidden layer nodes are connected to all output layer nodes. Ans. c
37. Machine learning techniques differ from statistical techniques in that machine learning methods a. typically assume an underlying distribution for the data.
b. are better able to deal with missing and noisy data. c. are not able to explain their behavior. d. have trouble with large-sized datasets. Ans. b
38. This supervised learning technique can process both numeric and categorical input attributes. a. linear regression b. Bayes classifier c. logistic regression d. backpropagation learning Ans. b
39. This technique associates a conditional probability value with each data instance. a. linear regression b. logistic regression c. simple regression d. multiple linear regression Ans. b
40. Logistic regression is a ________ regression technique that is used to model data having a _____outcome. a. linear, numeric b. linear, binary c. nonlinear, numeric d. nonlinear, binary Ans. d
41. Which of the following problems is best solved using time-series analysis? a. Predict whether someone is a likely candidate for having a stroke. b. Determine if an individual should be given an unsecured loan. c. Develop a profile of a star athlete. d. Determine the likelihood that someone will terminate their cell phone contract.
Ans. d
42. Which of the following is true about Naive Bayes? a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent
c. Both A and B d. None of the above options Ans. c 43. Simple regression assumes a __________ relationship between the input attribute and output attribute. a. linear b. quadratic c. reciprocal d. inverse
44. With Bayes classifier, missing data items are a. treated as equal compares. b. treated as unequal compares. c. replaced with a default value. d. ignored. 45. What is Machine learning? a. The autonomous acquisition of knowledge through the use of computer programs b. The autonomous acquisition of knowledge through the use of manual programs c. The selective acquisition of knowledge through the use of computer programs d. The selective acquisition of knowledge through the use of manual programs
Ans: a
46. Automated vehicle is an example of ______ a. Supervised learning b. Unsupervised learning c. Active learning d. Reinforcement learning
Ans: a
47. Multilayer perceptron network is a. Usually, the weights are initially set to small random values b. A hard-limiting activation function is often used c. The weights can only be updated after all the training vectors have been presented d. Multiple layers of neurons allow for less complex decision boundaries than a single layer
Ans: a
48. Neural networks a. optimize a convex cost function b. cannot be used for regression as well as classification c. always output values between 0 and 1 d. can be used in an ensemble
Ans: d
49. In neural networks, nonlinear activation functions such as sigmoid, tanh, and ReLU a. speed up the gradient calculation in backpropagation, as compared to linear units b. are applied only to the output units c. help to learn nonlinear decision boundaries d. always output values between 0 and 1
Ans: c
50. Which of the following is a disadvantage of decision trees?
a. Factor analysis b. Decision trees are robust to outliers c. Decision trees are prone to be overfit d. None of the above
Ans: c
51. Back propagation is a learning technique that adjusts weights in the neural network by propagating weight changes. a. Forward from source to sink b. Backward from sink to source c. Forward from source to hidden nodes d. Backward from sink to hidden nodes
Ans: b
52. Identify the following activation function : φ(V) = Z + (1/ 1 + exp (– x * V + Y) ),Z, X, Y are parameters
a. Step function b. Ramp function c. Sigmoid function
d. Gaussian function
Ans: c
53. An artificial neuron receives n inputs x1, x2, x3............xnwith weights w1, w2, ..........wn attached to the input links. The weighted sum_________________ is computed to be passed on to a non-linear filter Φ called activation function to release the output. a. Σ wi b. Σ xi c. Σ wi + Σ xi d. Σ wi* xi
Ans: d
54. With Bayes classifier, missing data items are a. treated as equal compares. b. treated as unequal compares. c. replaced with a default value. d. ignored.
Ans:b
55. Machine learning techniques differ from statistical techniques in that machine learning methods a. typically assume an underlying distribution for the data. b. are better able to deal with missing and noisy data. c. are not able to explain their behavior. d. have trouble with large-sized datasets.
Ans: b
56. Which of the following is true about Naive Bayes?
a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent c. Both a and b d. None of the above options
Ans: c
57. How many terms are required for building a Bayes model?
a. 1 b. 2 c. 3 d. 4
Ans: c
58. What does the Bayesian network provides? a. Complete description of the domain b. Partial description of the domain c. Complete description of the problem d. None of the mentioned
Ans: a
59. How the Bayesian network can be used to answer any query? a. Full distribution b. Joint distribution c. Partial distribution d. All of the mentioned
Ans: b
60. In which of the following learning the teacher returns reward and punishment to learner? a. Active learning b. Reinforcement learning c. Supervised learning d. Unsupervised learning
Ans: b
61. Which of the following is the model used for learning? a. Decision trees b. Neural networks c. Propositional and FOL rules d. All of the mentioned
Ans: d
Department of IT
COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 3 DataAnalytics(KIT601)
Q.1 Which attribute is _not_ indicative for data streaming?
A) Limited amount of memory
B) Limited amount of processing time
C) Limited amount of input data
D) Limited amount of processing power
A
Q.2 Which of the following statements about data streaming is true?
A) Stream data is always unstructured data.
B) Stream data often has a high velocity.
C) Stream elements cannot be stored on disk.
D) Stream data is always structured data.
Ans. B
Q.3 What is the main difference between standard reservoir sampling and min-wise sampling?
A) Reservoir sampling makes use of randomly generated numbers whereas minwise sampling does not.
B) Min-wise sampling makes use of randomly generated numbers whereas reservoir sampling does not.
C) Reservoir sampling requires a stream to be processed sequentially, whereas minwise does not.
D) For larger streams, reservoir sampling creates more accurate samples than minwise sampling.
Ans. C)
Q.4 A Bloom filter guarantees no
A) false positives
B) false negatives
C) false positives and false negatives
D) false positives or false negatives, depending on the Bloom filter type
Ans. B)
Q,5 Which of the following statements about standard Bloom filters is correct?
A) It is possible to delete an element from a Bloom filter.
B) A Bloom filter always returns the correct result.
C) It is possible to alter the hash functions of a full Bloom filter to create more
space.
Incorrect.
D) A Bloom filter always returns TRUE when testing for a previously added
element.
Ans. D)
Q.6 The DGIM algorithm was developed to estimate the counts of 1's occur within the last k bits of a stream window N. Which of the following statements is true about the estimate of the number of 0's based on DGIM?
A) The number of 0's cannot be estimated at all.
B) The number of 0's can be estimated with a maximum guaranteed error.
C) To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's.
D) None of these
Ans. B)
Q.7 Which of the following statements about the standard DGIM algorithm are false?
A)DGIM operates on a time-based window.
B) DGIM reduces memory consumption through a clever way of storing counts.
C) In DGIM, the size of a bucket is always a power of two.
D) The maximum number of buckets has to be chosen beforehand.
Ans. D)
Q.8 Which of the following statements about the standard DGIM algorithm are false?
A)DGIM operates on a time-based window.
B) DGIM reduces memory consumption through a clever way of storing counts.
C) In DGIM, the size of a bucket is always a power of two.
D) The buckets contain the count of 1's and each 1's specific position in the stream
Ans. D)
Q.9 What are DGIM’s maximum error boundaries? A) DGIM always underestimates the true count; at most by 25%
B) DGIM either underestimates or overestimates the true count; at most by 50%
C) DGIM always overestimates the count; at most by 50%
D) DGIM either underestimates or overestimates the true count; at most by 25%
Ans. B)
Q.10 Which algorithm should be used to approximate the number of distinct elements in a data stream?
A) Misra-Gries
B) Alon-Matias-Szegedy
C) DGIM
D) None of the above
Ans. D)
Q.11 Which algorithm should be used to approximate the number of distinct elements in a data stream?
A) Misra-Gries
B) Alon-Matias-Szegedy
C) DGIM
D) Flajolet and Martin
Ans. D)
Q.12 Which of the following streaming windows show valid bucket representations according to the DGIM rules?
A) 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 1
B) 1 0 1 1 1 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 1
C) 1 1 1 1 0 0 1 1 1 0 1 0 1
D) 1 0 1 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1
Ans. D)
Q.13 For which of the following streams is the second-order moment F2 greater than 45?
A) 10 5 5 10 10 10 1 1 1 10
B) 10 10 10 10 10 5 5 5 5 5
C) 1 1 1 1 1 5 10 10 5 1
D) None of these
Ans. B)
Q.14 For which of the following streams is the second-order moment F2 greater than 45?
A) 10 5 5 10 10 10 1 1 1 10
B) 10 10 10 10 10 10 10 10 10 10
C) 1 1 1 1 1 5 10 10 5 1
D) None of these
Ans. B)
Q 15 : In Bloom filter an array of n bits is initialized with
A) all 0s
B) all 1s
C) half 0s and half 1s
D) all -1
Ans. A)
Q 16. Pick a hash function h that maps each of the N elements to at least log2 N bits, Estimated number of distinct elements is
A) 2^R
B) 2^(-R)
C) 1-(2^R)
D) 1-(2^(-R))
Ans. A)
Q.17 Sliding window operations typically fall in the category
A) OLTP Transactions
B) Big Data Batch Processing
C) Big Data Real Time Processing
D) Small Batch Processing
Ans. C)
Q.18 What is the finally produced by Hierarchical Agglomerative Clustering?
A) final estimate of cluster centroids
B)assignment of each point to clusters
C) tree showing how close things are to each other
D) Group of clusters
Ans. C)
Q19 Which of the algorithm can be used for counting 1's in a stream
A) FM Algorithm
B) PCY Algorithm
C) DGIM Algorithm
D) SON Algorithm
Ans. C)
Q20 Which technique is used to filter unnecessary itemset in PCY algorithm
A) Association Rule
B) Hashing Technique
C) Data Mining
D) Market basket
Ans. B)
Q21 In association rule, which of the following indicates the measure of how frequently the items occur in a dataset ?
A) Support B) Confidence C) Basket D) Itemset
Ans. A)
Q.22 which of the following clustering technique is used by K- Means Algorithm
A) Hierarchical Technique
B) Partitional technique
C)Divisive
D) Agglomerative
Ans. B)
Q.23 which of the following clustering technique is used by Agglomerative Nesting Algorithm
A) Hierarchical Technique
B) Partitional technique
C) Density based
D)None of these
Q24. Which of the following Hierarchichal approach begins with each observation in a distinct (singleton) cluster, and successively merges clusters together until a stopping criterion is satisfied.
A) Divisive
B) Agglomerative
C) Single Link
D) Complete Link
Q.25 Park, Chen, Yu algorithm is useful for __________in Big Data Application.
A) Find Frequent Itemset
B) Filtering Stream
C) Distinct Element Find
D) None of these
Ans. A)
Q.26 .Match the following
a) Bloom filter i) Frequent Pattern Mining
b) FM Algorithm ii) Filtering Stream
c) PCY Algorithm iii) Distinct Element Find d) DGIM Algorithm iv) Counting 1’s in window A a)-ii), b-iii), c-i), d-iv)
B a)-iii), b-ii), c-i), d-iv)
C) A a)-i1), b-iii), c-ii), d-iv)
D) None of these
Ans. A)
SET II
1. Which of the following can be considered as the correct process of Data Mining? a. Infrastructure, Exploration, Analysis, Interpretation, Exploitation b. Exploration, Infrastructure, Analysis, Interpretation, Exploitation c. Exploration, Infrastructure, Interpretation, Analysis, Exploitation d. Exploration, Infrastructure, Analysis, Exploitation, Interpretation
Answer: a
Explanation: The process of data mining contains many sub-processes in a specific order. The correct order in which all sub-processes of data mining executes is Infrastructure, Exploration, Analysis, Interpretation, and Exploitation.
2. Which of the following is an essential process in which the intelligent methods are applied to extract data patterns? a. Warehousing b. Data Mining c. Text Mining d. Data Selection
Answer: b
Explanation: Data mining is a type of process in which several intelligent methods are used to extract meaningful data from the huge collection (or set) of data.
3. What are the functions of Data Mining? a. Association and correctional analysis classification b. Prediction and characterization
c. Cluster analysis and Evolution analysis d. All of the above
Answer: d
Explanation: In data mining, there are several functionalities used for performing the different types of tasks. The common functionalities used in data mining are cluster analysis, prediction, characterization, and evolution. Still, the association and correctional analysis classification are also one of the important functionalities of data mining.
4. Which attribute is _not_ indicative for data streaming?
a. Limited amount of memory b. Limited amount of processing time c. Limited amount of input data d. Limited amount of processing power
Ans. c
5. Which of the following statements about data streaming is true?
a. Stream data is always unstructured data. b. Stream data often has a high velocity. c. Stream elements cannot be stored on disk. d. Stream data is always structured data.
Ans. b
6. Which of the following statements about sampling are correct? a. Sampling reduces the amount of data fed to a subsequent data mining algorithm b. Sampling reduces the diversity of the data stream c. Sampling increases the amount of data fed to a data mining algorithm d. Sampling algorithms often need multiple passes over the data
Ans. a
7. Which of the following statements about sampling are correct? a. Sampling reduces the diversity of the data stream
b. Sampling increases the amount of data fed to a data mining algorithm c. Sampling algorithms often need multiple passes over the data d. Sampling aims to keep statistical properties of the data intact
Ans. d
8. What is the main difference between standard reservoir sampling and min-wise sampling?
a. Reservoir sampling makes use of randomly generated numbers whereas min-wise sampling does not. b. Min-wise sampling makes use of randomly generated numbers whereas reservoir sampling does not. c. Reservoir sampling requires a stream to be processed sequentially, whereas min-wise does not. d. For larger streams, reservoir sampling creates more accurate samples than min-wise sampling.
Ans. c
9. A Bloom filter guarantees no
a. false positives b. false negatives c. false positives and false negatives d. false positives or false negatives, depending on the Bloom filter type
Ans. b
10. Which of the following statements about standard Bloom filters is correct?
a. It is possible to delete an element from a Bloom filter. b. A Bloom filter always returns the correct result. c. It is possible to alter the hash functions of a full Bloom filter to create more space. d. A Bloom filter always returns TRUE when testing for a previously added element.
Ans. d
11. The FM-sketch algorithm uses the number of zeros the binary hash value ends in to make an estimation. Which of the following statements is true about the hash tail?
a. Any specific bit pattern is equally suitable to be used as hash tail.
b. Only bit patterns with more 0's than 1's are equally suitable to be used as hash tails. c. Only the bit patterns 0000000..00 (list of 0s) or 111111..11 (list of 1s) are suitable hash tails. d. Only the bit pattern 0000000..00 (list of 0s) is a suitable hash tail.
Ans. a
12. The FM-sketch algorithm can be used to:
a. Estimate the number of distinct elements. b. Sample data with a time-sensitive window. c. Estimate the frequent elements. d. Determine whether an element has already occurred in previous stream data.
Ans. a
13. The DGIM algorithm was developed to estimate the counts of 1's occur within the last kk bits of a stream window NN. Which of the following statements is true about the estimate of the number of 0's based on DGIM?
a. The number of 0's cannot be estimated at all. b. The number of 0's can be estimated with a maximum guaranteed error. c. To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's. d. None of above
Ans. b
14. Which of the following statements about the standard DGIM algorithm are false? a. DGIM operates on a time-based window b. DGIM reduces memory consumption through a clever way of storing counts c. In DGIM, the size of a bucket is always a power of two d. The maximum number of buckets has to be chosen beforehand. Ans. d
15. Which of the following statements about the standard DGIM algorithm are false? a. DGIM operates on a time-based window b. The buckets contain the count of 1's and each 1's specific position in the stream c. DGIM reduces memory consumption through a clever way of storing counts
d. In DGIM, the size of a bucket is always a power of two Ans. b
16. What are DGIM’s maximum error boundaries?
a. DGIM always underestimates the true count; at most by 25% b. DGIM either underestimates or overestimates the true count; at most by 50% c. DGIM always overestimates the count; at most by 50% d. DGIM either underestimates or overestimates the true count; at most by 25%
Ans. b
17. Which algorithm should be used to approximate the number of distinct elements in a data stream?
a. Misra-Gries b. Alon-Matias-Szegedy c. DGIM d. None of the above
Ans. d
18. Which of the following statements about Bloom filters are correct?
a. A Bloom filter has the same properties as a standard Hashmap data structure in Java (java.util.HashMap). b. A Bloom filter is full if no more hash functions can be added to it. c. A Bloom filter always returns FALSE when testing for an element that was not previously added d. A Bloom filter always returns TRUE when testing for a previously added element
Ans. d
19. Which of the following statements about Bloom filters are correct?
a. An empty Bloom filter (no elements added to it) will always return FALSE when testing for an element b. A Bloom filter has the same properties as a standard Hashmap data structure in Java (java.util.HashMap). c. A Bloom filter is full if no more hash functions can be added to it.
d. A Bloom filter always returns FALSE when testing for an element that was not previously added Ans. a
20. Which of the following streaming windows show valid bucket representations according to the DGIM rules?
a. 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 1 b. 1 0 1 1 1 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 1 c. 1 1 1 1 0 0 1 1 1 0 1 0 1 d. 1 0 1 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1
Ans. d
For which of the following streams is the second-order moment F2F2 greater than 45? 10 5 5 10 10 10 1 1 1 10 ✗ 10 10 10 10 10 5 5 5 5 5 ✗ This option is correct. 1 1 1 1 1 5 10 10 5 1 ✓ 10 10 10 10 10 10 10 10 10 10 ✗ This option is correct.
What is the space complexity of the FREQUENT algorithm? Recall that it aims to find all elements in a sequence whose frequency exceeds 1k1k of the total count. In the equations below, nn is the maximum value of each key and mm is the maximum value of each counter.
a. O(k(logm+logn))
Correct!
b. o(k(logm+logn)) c. O(logk(m+n)) d. o(logk(m+n))
Suppose that to get some information about something, you write a keyword in Google search. Google's analytical tools will then pre-compute large amounts of data to provide a quick output related to the keywords you have written.
19) Which of the following statements is correct about data mining?
a. It can be referred to as the procedure of mining knowledge from data b. Data mining can be defined as the procedure of extracting information from a set of the data c. The procedure of data mining also involves several other processes like data cleaning, data transformation, and data integration d. All of the above
Answer: d
Explanation: The term data mining can be defined as the process of extracting information from the massive collection of data. In other words, we can also say that data mining is the procedure of mining useful knowledge from a huge set of data.
25) The classification of the data mining system involves:
a. Database technology b. Information Science c. Machine learning d. All of the above
Answer: d
Explanation: Generally, the classification of a data mining system depends on the following criteria: Database technology, machine learning, visualization, information science, and several other disciplines.
27) The issues like efficiency, scalability of data mining algorithms comes under_______
a. Performance issues b. Diverse data type issues c. Mining methodology and user interaction d. All of the above
Answer: a
Explanation: In order to extract information effectively from a huge collection of data in databases, the data mining algorithm must be efficient and scalable. Therefore the correct answer is A.
Department of IT
COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 4 DataAnalytics(KIT601)
1. What does Apriori algorithm do? a. It mines all frequent patterns through pruning rules with lesser support b. It mines all frequent patterns through pruning rules with higher support c. Both a and b d. None of these
Ans. a
2. What techniques can be used to improve the efficiency of apriori algorithm? a. hash based techniques b. transaction reduction c. Partitioning d. All of these
Ans.d 3. What do you mean by support (A)?
a. Total number of transactions containing A b. Total Number of transactions not containing A c. Number of transactions containing A / Total number of transactions d. Number of transactions not containing A / Total number of transactions
Ans. c 4. Which of the following is direct application of frequent itemset mining? a. Social Network Analysis b. Market Basket Analysis c. outlier detection
d. intrusion detection
Ans. b 5. When do you consider an association rule interesting? a. If it only satisfies min_support b. If it only satisfies min_confidence c. If it satisfies both min_support and min_confidence d. There are other measures to check so
Ans. c
6. What is the difference between absolute and relative support? a. Absolute -Minimum support count threshold and Relative-Minimum support threshold b. Absolute-Minimum support threshold and Relative-Minimum support count threshold c. Both a and b d. None of these
Ans. a
7. What is the relation between candidate and frequent itemsets?
a. A candidate itemset is always a frequent itemset b. A frequent itemset must be a candidate itemset c. No relation between the two d. None of these
Ans. b
8. What is the principle on which Apriori algorithm work?
a. If a rule is infrequent, its specialized rules are also infrequent b. If a rule is infrequent, its generalized rules are also infrequent c. Both a and b d. None of these
Ans. a
9. Which of these is not a frequent pattern mining algorithm a. Apriori b. FP growth c. Decision trees d. Eclat
Ans. c
10. What are closed frequent itemsets?
a. A closed itemset b. A frequent itemset c. An itemset which is both closed and frequent d. None of these
Ans. c
11. What are maximal frequent itemsets? a. A frequent item set whose no super-itemset is frequent b. A frequent itemset whose super-itemset is also frequent c. Both a and b d. None of these
Ans. a
12. What is association rule mining?
a. Same as frequent itemset mining b. Finding of strong association rules using frequent itemsets c. Both a and b d. None of these
Ans. b
13. What is frequent pattern growth?
a. Same as frequent itemset mining b. Use of hashing to make discovery of frequent itemsets more efficient c. Mining of frequent itemsets without candidate generation d. None of these
Ans. c
14. When is sub-itemset pruning done?
a. A frequent itemset ‘P’ is a proper subset of another frequent itemset ‘Q’ b. Support (P) = Support(Q) c. When both a and b is true d. When a is true and b is not
Ans. c
15. Our use of association analysis will yield the same frequent itemsets and strong association rules whether a specific item occurs once or three times in an individual transaction
a. TRUE b. FALSE c. Both a and b d. None of these
Ans. a
16. The number of iterations in apriori __
a. increases with the size of the data b. decreases with the increase in size of the data c. increases with the size of the maximum frequent set d. decreases with increase in size of the maximum frequent set
Ans. c
17. Frequent item sets is a. Superset of only closed frequent item sets b. Superset of only maximal frequent item sets c. Subset of maximal frequent item sets d. Superset of both closed frequent item sets and maximal frequent item sets
Ans. c
18. Significant Bottleneck in the Apriori algorithm is a. Finding frequent itemsets b. pruning c. Candidate generation d. Number of iterations
Ans. c
19. Which Association Rule would you prefer a. High support and medium confidence b. High support and low confidence c. Low support and high confidence d. Low support and low confidence
Ans. c
20. The apriori property means a. If a set cannot pass a test, its supersets will also fail the same test b. To decrease the efficiency, do level-wise generation of frequent item sets c. To improve the efficiency, do level-wise generation of frequent item sets d. If a set can pass a test, its supersets will fail the same test
Ans. c
21. To determine association rules from frequent item sets a. Only minimum confidence needed b. Neither support not confidence needed c. Both minimum support and confidence are needed d. Minimum support is needed
Ans. c
22. A collection of one or more items is called as _____
( a ) Itemset ( b ) Support ( c ) Confidence ( d ) Support Count Ans. a
23. Frequency of occurrence of an itemset is called as _____
(a) Support (b) Confidence (c) Support Count (d) Rules Ans. c
24. An itemset whose support is greater than or equal to a minimum support threshold is ______
(a) Itemset (b) Frequent Itemset (c) Infrequent items (d) Threshold values
Ans. b
25. The goal of clustering is to- a. Divide the data points into groups b. Classify the data point into different classes c. Predict the output values of input data points d. All of the above
Ans. a
26. Clustering is a- a. Supervised learning b. Unsupervised learning c. Reinforcement learning d. None Ans. b 27. Which of the following clustering algorithms suffers from the problem of convergence at local optima? a. K- Means clustering b. Hierarchical clustering c. Diverse clustering d. All of the above Ans. d
28. Which version of the clustering algorithm is most sensitive to outliers? a. K-means clustering algorithm b. K-modes clustering algorithm c. K-medians clustering algorithm d. None
Ans. a 29. Which of the following is a bad characteristic of a dataset for clustering analysis-
a. Data points with outliers b. Data points with different densities c. Data points with non-convex shapes d. All of the above Ans. d
30. For clustering, we do not require- a. Labeled data b. Unlabeled data c. Numerical data d. Categorical data
Ans. a 31. The final output of Hierarchical clustering is- a. The number of cluster centroids b. The tree representing how close the data points are to each other c. A map defining the similar data points into individual groups d. All of the above Ans. b
32. Which of the step is not required for K-means clustering?
a. a distance metric b. initial number of clusters c. initial guess as to cluster centroids d. None Ans. d
33. Which of the following uses merging approach? a. Hierarchical clustering b. Partitional clustering c. Density-based clustering d. All of the above Ans. a 34. When does k-means clustering stop creating or optimizing clusters? a. After finding no new reassignment of data points b. After the algorithm reaches the defined number of iterations c. Both A and B d. None Ans. c 35. Which of the following clustering algorithm follows a top to bottom approach? a. K-means b. Divisible c. Agglomerative d. None Ans. b 36. Which algorithm does not require a dendrogram? a. K-means b. Divisible c. Agglomerative d. None
Ans. a 37. What is a dendrogram?
a. A hierarchical structure b. A diagram structure c. A graph structure d. None
Ans. a
38. Which one of the following can be considered as the final output of the hierarchal type of clustering? a. A tree which displays how the close thing are to each other b. Assignment of each point to clusters c. Finalize estimation of cluster centroids d. None of the above
Ans. a
39. Which one of the following statements about the K-means clustering is incorrect?
a. The goal of the k-means clustering is to partition (n) observation into (k) clusters b. K-means clustering can be defined as the method of quantization c. The nearest neighbor is the same as the K-means d. All of the above
Ans. c
40. The self-organizing maps can also be considered as the instance of _________ type of learning.
a. Supervised learning b. Unsupervised learning c. Missing data imputation d. Both A & C
Ans. b
41. Euclidean distance measure is can also defined as ___________
a. The process of finding a solution for a problem simply by enumerating all possible solutions according to some predefined order and then testing them
b. The distance between two points as calculated using the Pythagoras theorem c. A stage of the KDD process in which new data is added to the existing selection. d. All of the above
Ans. c
42. Which of the following refers to the sequence of pattern that occurs frequently?
a. Frequent sub-sequence b. Frequent sub-structure c. Frequent sub-items d. All of the above
Ans. a 43. Which method of analysis does not classify variables as dependent or independent? a) Regression analysis b) Discriminant analysis c) Analysis of variance d) Cluster analysis Answer: (d)
Department of IT
COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester DataAnalytics(KIT601)
1. The Process of describing the data that is huge and complex to store and process is known as
a. Analytics b. Data mining c. Big Data d. Data Warehouse
Ans C
2. Data generated from online transactions is one of the example for volume of big data. Is this true or False. a. TRUE b. FALSE
Ans. a 3. Velocity is the speed at which the data is processed
a. TRUE b. FALSE
Ans. b
4. _____________ have a structure but cannot be stored in a database.
a. Structured b. Semi-Structured c. Unstructured d. None of these
Ans. b 5. ____________refers to the ability to turn your data useful for business.
a. Velocity b. Variety c. Value d. Volume
Ans. C
6. Value tells the trustworthiness of data in terms of quality and accuracy.
a. TRUE b. FALSE
Ans. b 7. GFS consists of a ____________ Master and ___________ Chunk Servers a. Single, Single b. Multiple, Single c. Single, Multiple
d. Multiple, Multiple
Ans. c
8. Files are divided into ____________ sized Chunks. a. Static b. Dynamic c. Fixed d. Variable Ans. c
9. ____________is an open source framework for storing data and running application on clusters of commodity hardware. a. HDFS b. Hadoop c. MapReduce d. Cloud Ans. B
10. HDFS Stores how much data in each clusters that can be scaled at any time? a. 32 b. 64 c. 128 d. 256 Ans. c
11. Hadoop MapReduce allows you to perform distributed parallel processing on large volumes of data quickly and efficiently... is this MapReduce or Hadoop... i.e statement is True or False a. TRUE b. FALSE Ans. a
12. Hortonworks was introduced by Cloudera and owned by Yahoo. a. TRUE b. FALSE Ans. b
13. Hadoop YARN is used for Cluster Resource Management in Hadoop Ecosystem. a. TRUE b. FALSE Ans. a
14. Google Introduced MapReduce Programming model in 2004. a. TRUE b. FALSE Ans. A
15.______________ phase sorts the data & ____________creates logical clusters. a. Reduce, YARN b. MAP, YARN c. REDUCE, MAP d. MAP, REDUCE Ans. d
16. There is only one operation between Mapping and Reducing is it True or False...
a. TRUE b. FALSE
Ans. A
17. __________ is factors considered before Adopting Big Data Technology. a. Validation b. Verification c. Data d. Design Ans. a
18. _________ for improving supply chain management to optimize stock management, replenishment, and forecasting; a. Descriptive b. Diagnostic c. Predictive d. Prescriptive Ans. c
19. which among the following is not a Data mining and analytical applications? a. profile matching b. social network analysis c. facial recognition d. Filtering Ans. d
20. ________________ as a result of data accessibility, data latency, data availability, or limits on bandwidth in relation to the size of inputs. a. Computation-restricted throttling b. Large data volumes c. Data throttling d. Benefits from data parallelization Ans. c
21. As an example, an expectation of using a recommendation engine would be to increase same-customer sales by adding more items into the market basket. a. Lowering costs b. Increasing revenues c. Increasing productivity d. Reducing risk Ans. b
22. Which storage subsystem can support massive data volumes of increasing size. a. Extensibility b. Fault tolerance c. Scalability d. High-speed I/O capacity Ans. c
23. ______________provides performance through distribution of data and fault tolerance through replication a. HDFS b. PIG c. HIVE d. HADOOP
Ans. a
24. ______________ is a programming model for writing applications that can process Big Data in parallel on multiple nodes. a. HDFS b. MAP REDUCE c. HADOOP d. HIVE Ans. b
25. _____________________ takes the grouped key-value paired data as input and runs a Reducer function on each one of them. a. MAPPER b. REDUCER c. COMBINER d. PARTITIONER Ans. b
26. _______________ is a type of local Reducer that groups similar data from the map phase into identifiable sets. a. MAPPER b. REDUCER c. COMBINER d. PARTITIONER. Ans. c
27. MongoDB is __________________ a. Column Based b. Key Value Based c. Document Based d. Graph Based Ans. c
28. ____________ is the process of storing data records across multiple machines a. Sharding b. HDFS c. HIVE d. HBASE Ans. a
29. The results of a hive query can be stored as a. Local File b. HDFS File c. Both d. Cannot be stored Ans. c 30. The position of a specific column in a Hive table a. can be anywhere in the table creation clause b. must match the position of the corresponding data in the data file c. Must match the position only for date time data type in the data file d. Must be arranged alphabetically Ans. b 31. The Hbase tables are A. Made read only by setting the read-only option B. Always writeable
C. Always read-only D. Are made read only using the query to the table
Ans. a 32. Hbase creates a new version of a record during A. Creation of a record B. Modification of a record C. Deletion of a record D. All the above Ans. d 33. Which among the following are incorrect in regards with NoSQL? a. Its Easy and ready to manage with clusters. b. Suitable for upcoming data explosions. c. It requires to keep track with data structure d. Provide easy and flexible system. Ans. c 34. Which Database Administrator job was in trends with job trends? a. MongoDB b. CouchDB c. SimpleDB d. Redis Ans. a 35. No SQL Means _________________ a. Not SQL b. No Usage of SQl c. Not Only SQL d. Not for SQL Ans. c 36. A list of 5 pulse rates is: 70, 64, 80, 74, 92. What is the median for this list? a. 74 b. 76 c. 77 d. 80 Ans. a 37. Which of the following would indicate that a dataset is not bell-shaped? a. The range is equal to 5 standard deviations. b. The range is larger than the interquartile range. c. The mean is much smaller than the median. d. There are no outliers Ans. c 38. What is the effect of an outlier on the value of a correlation coefficient? a. An outlier will always decrease a correlation coefficient. b. An outlier will always increase a correlation coefficient. c. An outlier might either decrease or increase a correlation coefficient, depending on where it is in relation to the other points. d. An outlier will have no effect on a correlation coefficient. Ans. c 39. One use of a regression line is a. to determine if any x-values are outliers. b. to determine if any y-values are outliers. c. to determine if a change in x causes a change in y. d. to estimate the change in y for a one-unit change in x. Ans. d 40. Which package contains most of the basic function in R. a. Root b. Basic c. Parent
d. R
Ans. b
SET II
1. who was the developer of Hadoop language?
A. Apache Software Foundation B. Hadoop Software Foundation C. Sun Microsystems D. Bell Labs View Answer Ans : A
Explanation: Hadoop Developed by: Apache Software Foundation.
2. The hadoop language wriiten in which language?
A. C B. C++ C. Java D. Python View Answer Ans : C
Explanation: The hadoop language Written in: Java. 3. What was the Initial release date of hadoop?
A. 1st April 2007 B. 1st April 2006 C. 1st April 2008 D. 1st April 2005 View Answer Ans : B
Explanation: Initial release: April 1, 2006; 13 years ago. 4. What license is Hadoop distributed under?
A. Apache License 2.1 B. Apache License 2.2 C. Apache License 2.0 D. Apache License 1.0 View Answer Ans : C
Explanation: Hadoop is Open Source, released under Apache 2 license.
5. IBM and ________ have announced a major initiative to use Hadoop to support university courses in distributed computer programming.
A. Google B. Apple C. Facebook D. Microsoft View Answer Ans : A
Explanation: Google and IBM Announce University Initiative to Address Internet-Scale. 6. On which platfrm hadoop langauge runs?
A. Bare metal B. Debian C. Cross-platform D. Unix-Like View Answer Ans : C
Explanation: Hadoop has support for cross platform operating system.
10. Which of the following is not Features Of Hadoop?
A. Suitable for Big Data Analysis B. Scalability C. Robust D. Fault Tolerance View Answer Ans : C
Explanation: Robust is is not Features Of Hadoop.
1. The MapReduce algorithm contains two important tasks, namely __________.
A. mapped, reduce B. mapping, Reduction C. Map, Reduction D. Map, Reduce View Answer Ans : D
Explanation: The MapReduce algorithm contains two important tasks, namely Map and Reduce. 2. _____ takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs).
A. Map B. Reduce C. Both A and B D. Node View Answer Ans : A
Explanation: Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). 3. ______ task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples.
A. Map B. Reduce C. Node D. Both A and B View Answer Ans : B
Explanation: Reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples. 4. In how many stages the MapReduce program executes?
A. 2 B. 3 C. 4 D. 5 View Answer
Ans : B
Explanation: MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage. 5. Which of the following is used to schedules jobs and tracks the assign jobs to Task tracker?
A. SlaveNode B. MasterNode C. JobTracker D. Task Tracker View Answer Ans : C
Explanation: JobTracker : Schedules jobs and tracks the assign jobs to Task tracker. 6. Which of the following is used for an execution of a Mapper or a Reducer on a slice of data?
A. Task B. Job C. Mapper D. PayLoad View Answer Ans : A
Explanation: Task : An execution of a Mapper or a Reducer on a slice of data. 7. Which of the following commnd runs a DFS admin client?
A. secondaryadminnode B. nameadmin C. dfsadmin D. adminsck View Answer Ans : C
Explanation: dfsadmin : Runs a DFS admin client. 8. Point out the correct statement.
A. MapReduce tries to place the data and the compute as close as possible B. Map Task in MapReduce is performed using the Mapper() function C. Reduce Task in MapReduce is performed using the Map() function D. None of the above View Answer Ans : A
Explanation: This feature of MapReduce is "Data Locality". 9. Although the Hadoop framework is implemented in Java, MapReduce applications need not be written in ____________
A. C B. C# C. Java D. None of the above View Answer
Ans : C
Explanation: Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications (non JNITM based). 10. The number of maps is usually driven by the total size of ____________
A. Inputs B. Output C. Task D. None of the above View Answer Ans : A
Explanation: Total size of inputs means the total number of blocks of the input files. 1. What is full form of HDFS?
A. Hadoop File System B. Hadoop Field System C. Hadoop File Search D. Hadoop Field search View Answer Ans : A
Explanation: Hadoop File System was developed using distributed file system design. 2. HDFS works in a __________ fashion.
A. worker-master fashion B. master-slave fashion C. master-worker fashion D. slave-master fashion View Answer Ans : B
Explanation: HDFS follows the master-slave architecture. 3. Which of the following are the Goals of HDFS?
A. Fault detection and recovery B. Huge datasets C. Hardware at data D. All of the above View Answer Ans : D
Explanation: All the above option are the goals of HDFS. 4. ________ NameNode is used when the Primary NameNode goes down.
A. Rack B. Data C. Secondary D. Both A and B View Answer Ans : C
Explanation: Secondary namenode is used for all time availability and reliability.
5. The minimum amount of data that HDFS can read or write is called a _____________.
A. Datanode B. Namenode C. Block D. None of the above View Answer Ans : C
Explanation: The minimum amount of data that HDFS can read or write is called a Block. 6. The default block size is ______.
A. 32MB B. 64MB C. 128MB D. 16MB View Answer Ans : B
Explanation: The default block size is 64MB, but it can be increased as per the need to change in HDFS configuration. 7. For every node (Commodity hardware/System) in a cluster, there will be a _________.
A. Datanode B. Namenode C. Block D. None of the above View Answer Ans : A
Explanation: For every node (Commodity hardware/System) in a cluster, there will be a datanode. 8. Which of the following is not Features Of HDFS?
A. It is suitable for the distributed storage and processing. B. Streaming access to file system data. C. HDFS provides file permissions and authentication. D. Hadoop does not provides a command interface to interact with HDFS. View Answer Ans : D
Explanation: The correct feature is Hadoop provides a command interface to interact with HDFS. 9. HDFS is implemented in _____________ language.
A. Perl B. Python C. Java D. C View Answer Ans : C
Explanation: HDFS is implemented in Java and any computer which can run Java can host a NameNode/DataNode on it.
10. During start up, the ___________ loads the file system state from the fsimage and the edits log file.
A. Datanode B. Namenode C. Block D. ActionNode View Answer Ans : B
Explanation: HDFS is implemented on any computer which can run Java can host a NameNode/DataNode on it. 1. Which of the following is not true about Pig?
A. Apache Pig is an abstraction over MapReduce B. Pig can not perform all the data manipulation operations in Hadoop. C. Pig is a tool/platform which is used to analyze larger sets of data representing them as data flows. D. None of the above View Answer Ans : B
Explanation: Pig is generally used with Hadoop; we can perform all the data manipulation operations in Hadoop using Apache Pig. 2. Which of the following is/are a feature of Pig?
A. Rich set of operators B. Ease of programming C. Extensibility D. All of the above View Answer Ans : D
Explanation: All options are the following Features of Pig. 3. In which year apache Pig was released?
A. 2005 B. 2006 C. 2007 D. 2008 View Answer Ans : B
Explanation: In 2006, Apache Pig was developed as a research project. 4. Pig operates in mainly how many nodes?
A. 2 B. 3 C. 4 D. 5 View Answer Ans : A
Explanation: You can run Pig (execute Pig Latin statements and Pig commands) using various mode: Interactive and Batch Mode. 5. Which of the following company has developed PIG?
A. Google B. Yahoo C. Microsoft D. Apple View Answer Ans : B
Explanation: Apache Pig was developed as a research project at Yahoo, especially to create and execute MapReduce jobs on every dataset. 6. Which of the following function is used to read data in PIG?
A. Write B. Read C. Perform D. Load View Answer Ans : D
Explanation: PigStorage is the default load function. 7. __________ is a framework for collecting and storing script-level statistics for Pig Latin.
A. Pig Stats B. PStatistics C. Pig Statistics D. All of the above View Answer Ans : C
Explanation: The new Pig statistics and the existing Hadoop statistics can also be accessed via the Hadoop job history file. 8. Which of the following is true statement?
A. Pig is a high level language. B. Performing a Join operation in Apache Pig is pretty simple. C. Apache Pig is a data flow language. D. All of the above View Answer Ans : D
Explanation: All option are true statement. 9. Which of the following will compile the Pigunit?
A. $pig_trunk ant pigunit-jar B. $pig_tr ant pigunit-jar C. $pig_ ant pigunit-jar D. $pigtr_ ant pigunit-jar View Answer Ans : A
Explanation: The compile will create the pigunit.jar file.
10. Point out the wrong statement.
A. Pig can invoke code in language like Java Only B. Pig enables data workers to write complex data transformations without knowing Java C. Pig's simple SQL-like scripting language is called Pig Latin, and appeals to developers already familiar with scripting languages and SQL D. Pig is complete, so you can do all required data manipulations in Apache Hadoop with Pig View Answer Ans : A
Explanation: Through the User Defined Functions(UDF) facility in Pig, Pig can invoke code in many languages like JRuby, Jython and Java. 1. Which of the following is/are INCORRECT with respect to Hive?
A. Hive provides SQL interface to process large amount of data B. Hive needs a relational database like oracle to perform query operations and store data. C. Hive works well on all files stored in HDFS D. Both A and B View Answer Ans : B
Explanation: Hive needs a relational database like oracle to perform query operations and store data is incorrect with respect to Hive. 2. Which of the following is not a Features of HiveQL?
A. Supports joins B. Supports indexes C. Support views D. Support Transactions View Answer Ans : D
Explanation: Support Transactions is not a Features of HiveQL. 3. Which of the following operator executes a shell command from the Hive shell?
A. | B. ! C. # D. $ View Answer Ans : B
Explanation: Exclamation operator is for execution of command. 4. Hive uses _________ for logging.
A. logj4 B. log4l C. log4i D. log4j View Answer Ans : D
Explanation: By default Hive will use hive-log4j.default in the conf/ directory of the Hive installation. 5. HCatalog is installed with Hive, starting with Hive release is ___________
A. 0.10.0 B. 0.9.0 C. 0.11.0 D. 0.12.0 View Answer Ans : C
Explanation: hcat commands can be issued as hive commands, and vice versa. 6. _______ supports a new command shell Beeline that works with HiveServer2.
A. HiveServer2 B. HiveServer3 C. HiveServer4 D. HiveServer5 View Answer Ans : A
Explanation: The Beeline shell works in both embedded mode as well as remote mode. 7. The ________ allows users to read or write Avro data as Hive tables.
A. AvroSerde B. HiveSerde C. SqlSerde D. HiveQLSerde View Answer Ans : A
Explanation: AvroSerde understands compressed Avro files. 8. Which of the following data type is supported by Hive?
A. map B. record C. string D. enum View Answer Ans : D
Explanation: Hive has no concept of enums. 9. We need to store skill set of MCQs(which might have multiple values) in MCQs table, which of the following is the best way to store this information in case of Hive?
A. Create a column in MCQs table of STRUCT data type B. Create a column in MCQs table of MAP data type C. Create a column in MCQs table of ARRAY data type D. As storing multiple values in a column of MCQs itself is a violation View Answer Ans : C
Explanation: Option C is correct.
10. Letsfindcourse is generating huge amount of data. They are generating huge amount of sensor data from different courses which was unstructured in form. They moved to Hadoop framework for storing and analyzing data. What technology in Hadoop framework, they can use to analyse this unstructured data?
A. MapReduce programming B. Hive C. RDBMS D. None of the above View Answer Ans : A
Explanation: MapReduce programming is the right answer. 1. which of the following is correct statement?
A. HBase is a distributed column-oriented database B. Hbase is not open source C. Hbase is horizontally scalable. D. Both A and C View Answer Ans : D
Explanation: HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an open-source project and is horizontally scalable. 2. which of the following is not a feature of Hbase?
A. HBase is lateral scalable. B. It has automatic failure support. C. It provides consistent read and writes. D. It has easy java API for client. View Answer Ans : A
Explanation: Option A is incorrect because HBase is linearly scalable. 3. When did HBase was first released?
A. April 2007 B. March 2007 C. February 2007 D. May 2007 View Answer Ans : C
Explanation: HBase was first released in February 2007. Later in January 2008, HBase became a sub project of Apache Hadoop. 4. Apache HBase is a non-relational database modeled after Google's _________
A. BigTop B. Bigtable C. Scanner D. FoundationDB View Answer Ans : B
Explanation: Bigtable acts up on Google File System, likewise Apache HBase works on top of Hadoop and HDFS. 5. HBaseAdmin and ____________ are the two important classes in this package that provide DDL functionalities.
A. HTableDescriptor B. HDescriptor C. HTable D. HTabDescriptor View Answer Ans : A
Explanation: Java provides an Admin API to achieve DDL functionalities through programming 6. which of the following is correct statement?
A. HBase provides fast lookups for larger tables. B. It provides low latency access to single rows from billions of records C. HBase is a database built on top of the HDFS. D. All of the above View Answer Ans : D
Explanation: All the options are correct. 7. HBase supports a ____________ interface via Put and Result.
A. bytes-in/bytes-out B. bytes-in C. bytes-out D. None of the above View Answer Ans : A
Explanation: Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes. 8. Which command is used to disable all the tables matching the given regex?
A. remove all B. drop all C. disable_all D. None of the above View Answer Ans : C
Explanation: The syntax for disable_all command is as follows : hbase > disable_all 'r.*' 9. _________ is the main configuration file of HBase.
A. hbase.xml B. hbase-site.xml C. hbase-site-conf.xml D. hbase-conf.xml View Answer Ans : B
Explanation: Set the data directory to an appropriate location by opening the HBase home folder in /usr/local/HBase. 10. which of the following is incorrect statement?
A. HBase is built for wide tables B. Transactions are there in HBase. C. HBase has de-normalized data. D. HBase is good for semi-structured as well as structured data. View Answer Ans : B
Explanation: No transactions are there in HBase. 1. R was created by?
A. Ross Ihaka B. Robert Gentleman C. Both A and B D. Ross Gentleman View Answer Ans : C
Explanation: R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. 2. R allows integration with the procedures written in the?
A. C B. Ruby C. Java D. Basic View Answer Ans : A
Explanation: R allows integration with the procedures written in the C, C++, .Net, Python or FORTRAN languages for efficiency. 3. R is free software distributed under a GNU-style copy left, and an official part of the GNU project called?
A. GNU A B. GNU S C. GNU L D. GNU R View Answer Ans : B
Explanation: R is free software distributed under a GNU-style copy left, and an official part of the GNU project called GNU S. 4. R made its first appearance in?
A. 1992 B. 1995 C. 1993 D. 1994 View Answer
Ans : C
Explanation: R made its first appearance in 1993. 5. Which of the following is true about R?
A. R is a well-developed, simple and effective programming language B. R has an effective data handling and storage facility C. R provides a large, coherent and integrated collection of tools for data analysis. D. All of the above View Answer Ans : D
Explanation: All of the above statement are true. 6. Point out the wrong statement?
A. Setting up a workstation to take full advantage of the customizable features of R is a straightforward thing B. q() is used to quit the R program C. R has an inbuilt help facility similar to the man facility of UNIX D. Windows versions of R have other optional help systems also View Answer Ans : B
Explanation: help command is used for knowing details of particular command in R. 7. Command lines entered at the console are limited to about ________ bytes
A. 4095 B. 4096 C. 4097 D. 4098 View Answer Ans : A
Explanation: Elementary commands can be grouped together into one compound expression by braces (‘{’ and ‘}’). 8. R language is a dialect of which of the following languages?
A. s B. c C. sas D. matlab View Answer Ans : A
Explanation: The R language is a dialect of S which was designed in the 1980s. Since the early 90’s the life of the S language has gone down a rather winding path. The scoping rules for R are the main feature that makes it different from the original S language. 9. How many atomic vector types does R have?
A. 3 B. 4 C. 5 D. 6 View Answer
Ans : D
Explanation: R language has 6 atomic data types. They are logical, integer, real, complex, string (or character) and raw. There is also a class for “raw” objects, but they are not commonly used directly in data analysis. 10. R files has an extension _____.
A. .S B. .RP C. .R D. .SP View Answer Ans : C
Explanation: All R files have an extension .R. R provides a mechanism for recalling and reexecuting previous commands. All S programmed files will have an extension .S. But R has many functions than S. 1. What will be output for the following code?
v <- TRUE
print(class(v))
A. logical B. Numeric C. Integer D. Complex View Answer Ans : A
Explanation: It produces the following result : [1] ""logical""
2. What will be output for the following code?
v <- ""TRUE""
print(class(v))
A. logical B. Numeric C. Integer D. Character View Answer Ans : D
Explanation: It produces the following result : [1] ""character""
3. In R programming, the very basic data types are the R-objects called?
A. Lists B. Matrices
C. Vectors D. Arrays View Answer Ans : C
Explanation: In R programming, the very basic data types are the R-objects called vectors
4. Data Frames are created using the?
A. frame() function B. data.frame() function C. data() function D. frame.data() function View Answer Ans : B
Explanation: Data Frames are created using the data.frame() function 5. Which functions gives the count of levels?
A. level B. levels C. nlevels D. nlevel View Answer Ans : C
Explanation: Factors are created using the factor() function. The nlevels functions gives the count of levels. 6. Point out the correct statement?
A. Empty vectors can be created with the vector() function B. A sequence is represented as a vector but can contain objects of different classes C. "raw” objects are commonly used directly in data analysis D. The value NaN represents undefined value View Answer Ans : A
Explanation: A vector can only contain objects of the same class. 7. What will be the output of the following R code?
> x <- vector(""numeric"", length = 10)
> x
A. 1 0 B. 0 0 0 0 0 0 0 0 0 0 C. 0 1 D. 0 0 1 1 0 1 1 0 View Answer Ans : B
Explanation: You can also use the vector() function to initialize vectors.
8. What will be output for the following code?
> sqrt(-17)
A. -4.02 B. 4.02 C. 3.67 D. NAN View Answer Ans : D
Explanation: These metadata can be very useful in that they help to describe the object. 9. _______ function returns a vector of the same size as x with the elements arranged in increasing order.
A. sort() B. orderasc() C. orderby() D. sequence() View Answer Ans : A
Explanation: There are other more flexible sorting facilities available like order() or sort.list() which produce a permutation to do the sorting. 10. What will be the output of the following R code?
> m <- matrix(nrow = 2, ncol = 3)
> dim(m)
A. 3 3 B. 3 2 C. 2 3 D. 2 2 View Answer Ans : C
Explanation: Matrices are constructed column-wise. 1. Which loop executes a sequence of statements multiple times and abbreviates the code that manages the loop variable?
A. for B. while C. do-while D. repeat View Answer Ans : D
Explanation: repeat loop : Executes a sequence of statements multiple times and abbreviates the code that manages the loop variable. 2. Which of the following true about for loop?
A. Repeats a statement or group of statements while a given condition is true. It tests the condition before executing the loop body. B. it tests the condition at the end of the loop body. C. Both A and B D. None of the above View Answer Ans : B
Explanation: for loop : Like a while statement, except that it tests the condition at the end of the loop body. 3. Which statement simulates the behavior of R switch?
A. Next B. Previous C. break D. goto View Answer Ans : A
Explanation: The next statement simulates the behavior of R switch. 4. In which statement terminates the loop statement and transfers execution to the statement immediately following the loop?
A. goto B. switch C. break D. label View Answer Ans : C
Explanation: Break : Terminates the loop statement and transfers execution to the statement immediately following the loop. 5. Point out the wrong statement?
A. Multi-line expressions with curly braces are just not that easy to sort through when working on the command line B. lappy() loops over a list, iterating over each element in that list C. lapply() does not always returns a list D. You cannot use lapply() to evaluate a function multiple times each with a different argument View Answer Ans : C
Explanation: lapply() always returns a list, regardless of the class of the input. 6. The mapply() function is a multivariate apply of sorts which applies a function in parallel over a set of arguments.
A. TRUE B. FALSE C. Can be true or false D. Can not say View Answer Ans : A
Explanation: True, The mapply() function is a multivariate apply of sorts which applies a function in parallel over a set of arguments. 7. Which of the following is valid body of split function?
A. function (x, f) B. function (x, f, drop = FALSE, …) C. function (x, drop = FALSE, …) D. function (drop = FALSE, …) View Answer Ans : B
Explanation: x is a vector (or list) or data frame 8. Which of the following character skip during execution?
v <- LETTERS[1:6]
for ( i in v) {
if (i == ""D"") {
next
}
print(i)
}
A. A B. B C. C D. D View Answer Ans : D
Explanation: When the above code is compiled and executed, it produces the following result : [1] ""A"" [1] ""B"" [1] ""C"" [1] ""E"" [1] ""F""
9. What will be output for the following code?
v <- LETTERS[1]
for ( i in v) {
print(v)
}
A. A B. A B C. A B C D. A B C D View Answer Ans : A
Explanation: The output for the following code : [1] ""A"" 10. What will be output for the following code?
v <- LETTERS[""A""]
for ( i in v) {
print(v)
}
A. A B. NAN C. NA D. Error View Answer Ans : C
Explanation: The output for the following code : [1] NA 1. An R function is created by using the keyword?
A. fun B. function C. declare D. extends View Answer Ans : B
Explanation: An R function is created by using the keyword function. 2. What will be output for the following code?
print(mean(25:82))
A. 1526 B. 53.5 C. 50.5 D. 55 View Answer Ans : B
Explanation: The code will find mean of numbers from 25 to 82 that is 53.5 3. Point out the wrong statement?
A. Functions in R are “second class objects” B. The writing of a function allows a developer to create an interface to the code, that is explicitly specified with a set of parameters
C. Functions provides an abstraction of the code to potential users D. Writing functions is a core activity of an R programmer View Answer Ans : A
Explanation: Functions in R are “first class objects”, which means that they can be treated much like any other R object. 4. What will be output for the following code?
> paste("a", "b", se = ":")
A. a+b B. a:b C. a-b D. None of the above View Answer Ans : D
Explanation: With the paste() function, the arguments sep and collapse must be named explicitly and in full if the default values are not going to be used. 5. Which function in R language is used to find out whether the means of 2 groups are equal to each other or not?
A. f.tests () B. l.tests () C. t.tests () D. p.tests () View Answer Ans : C
Explanation: t.tests () function in R language is used to find out whether the means of 2 groups are equal to each other. It is not used most commonly in R. It is used in some specific conditions. 6. What will be the output of log (-5.8) when executed on R console?
A. NA B. NAN C. 0.213 D. Error View Answer Ans : B
Explanation: Executing the above on R console or terminal will display a warning sign that NaN (Not a Number) will be produced in R console because it is not possible to take a log of a negative number(-). 7. Which function is preferred over sapply as vapply allows the programmer to specific the output type?
A. Lapply B. Japply C. Vapply D. Zapply View Answer
Ans : C
Explanation: Vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use. simplify2array() is the utility called from sapply() when simplify is not false and is similarly called from mapply(). 8. How will you check if an element is present in a vector?
A. Match() B. Dismatch() C. Mismatch() D. Search() View Answer Ans : A
Explanation: It can be done using the match () function- match () function returns the first appearance of a particular element. The other way is to use %in% which returns a Boolean value either true or false. 9. You can check to see whether an R object is NULL with the _________ function.
A. is.null() B. is.nullobj() C. null() D. as.nullobj() View Answer Ans : A
Explanation: It is sometimes useful to allow an argument to take the NULL value, which might indicate that the function should take some specific action. 10. In the base graphics system, which function is used to add elements to a plot?
A. Boxplot() B. Text() C. Treat() D. Both A and B View Answer Ans : D
Explanation: In the base graphics system, boxplot or text function is used to add elements to a plot. 1. Which of the following syntax is used to install forecast package?
A. install.pack("forecast") B. install.packages("cast") C. install.packages("forecast") D. install.pack("forecastcast") View Answer Ans : C
Explanation: forecast is used for time series analysis 2. Which splits a data frame and returns a data frame?
A. apply B. ddply
C. stats D. plyr View Answer Ans : B
Explanation: ddply splits a data frame and returns a data frame. 3. Which of the following is an R package for the exploratory analysis of genetic and genomic data?
A. adeg B. adegenet C. anc D. abd View Answer Ans : B
Explanation: This package contains Classes and functions for genetic data analysis within the multivariate framework. 4. Which of the following contains functions for processing uniaxial minute-to-minute accelerometer data?
A. accelerometry B. abc C. abd D. anc View Answer Ans : A
Explanation: This package contains a collection of functions that perform operations on timeseries accelerometer data, such as identify non-wear time, flag minutes that are part of an activity about, and find the maximum 10-minute average count value. 5. ______ Uses Grieg-Smith method on 2 dimensional spatial data.
A. G.A. B. G2db C. G.S. D. G1DBN View Answer Ans : C
Explanation: The function returns a GriegSmith object which is a matrix with block sizes, sum of squares for each block size as well as mean sums of squares. G1DBN is a package performing Dynamic Bayesian Network Inference. 6. Which of the following package provide namespace management functions not yet present in base R?
A. stringr B. nbpMatching C. messagewarning D. namespace View Answer Ans : D
Explanation: The package namespace is one of the most confusing parts of building a package. nbpMatching contains functions for non-bipartite optimal matching. 7. What will be the output of the following R code?
install.packages(c("devtools", "roxygen2"))
A. Develops the tools B. Installs the given packages C. Exits R studio D. Nothing happens View Answer Ans : B
Explanation: Make sure you have the latest version of R and then run the above code to get the packages you’ll need. It installs the given packages. Confirm that you have a recent version of RStudio. 8. A bundled package is a package that’s been compressed into a ______ file.
A. Double B. Single C. Triple D. No File View Answer Ans : B
Explanation: A bundled package is a package that’s been compressed into a single file. A source package is just a directory with components like R/, DESCRIPTION, and so on. 9. .library() is not useful when developing a package since you have to install the package first.
A. TRUE B. FALSE C. Can be true or false D. Can not say View Answer Ans : A
Explanation: library() is not useful when developing a package since you have to install the package first. A library is a simple directory containing installed packages.
10. DESCRIPTION uses a very simple file format called DCF.
A. TRUE B. FALSE C. Can be true or false D. Can not say View Answer Ans : A
Explanation: DESCRIPTION uses a very simple file format called DCF, the Debian control format. When you first start writing packages, you’ll mostly use these metadata to record what packages are needed to run your package.
19. HDFS Stores how much data in each clusters that can be scaled at any time? 1. 32 2. 64 3. 128 4. 256 Show Answer 128
33. _____ provides performance through distribution of data and fault tolerance through replication 1. HDFS 2. PIG 3. HIVE 4. HADOOP Show Answer HDFS
34. ______ is a programming model for writing applications that can process Big Data in parallel on multiple nodes. 1. HDFS 2. MAP REDUCE 3. HADOOP 4. HIVE Show Answer MAP REDUCE
35. ____ takes the grouped key-value paired data as input and runs a Reducer function on each one of them. 1. MAPPER 2. REDUCER 3. COMBINER 4. PARTITIONER Show Answer REDUCER
36. ____ is a type of local Reducer that groups similar data from the map phase into identifiable sets. 1. MAPPER 2. REDUCER 3. COMBINER 4. PARTITIONER Show Answer COMBINER
37. While Installing Hadoop how many xml files are edited and list them ? 1. core-site.xml 2. hdfs-site.xml 3. mapred.xml 4. yarn.xml Show Answer core-site.xml
1.1 Information is a. Data b. Processed Data c. Manipulated input d. Computer output 1.2 Data by itself is not useful unless a. It is massive b. It is processed to obtain information c. It is collected from diverse sources d. It is properly stated 1.3 For taking decisions data must be a Very accurate b Massive c Processed correctly d Collected from diverse sources 1.4 Strategic information is needed for a Day to day operations b Meet government requirements c Long range planning d Short range planning 1.5 Strategic information is required by a Middle managers b Line managers c Top managers d All workers 1.6 Tactical information is needed for a Day to day operations b Meet government requirements c Long range planning d Short range planning 1.7 Tactical information is required by
a Middle managers b Line managers c Top managers d All workers 1.8 Operational information is needed for a Day to day operations b Meet government requirements c Long range planning d Short range planning 1.9 Operational information is required by a Middle managers b Line managers c Top managers d All workers 1.10 Statutory information is needed for a Day to day operations b Meet government requirements c Long range planning d Short range planning 1.11 In motor car manufacturing the following type of information is strategic a Decision on introducing a new model b Scheduling production c Assessing competitor car d Computing sales tax collected 1.12 In motor car manufacturing the following type of information is tactical a Decision on introducing a new model b Scheduling production c Assessing competitor car d Computing sales tax collected
1.13 In motor car manufacturing the following type of information is operational
a Decision on introducing a new model b Scheduling production c Assessing competitor car d Computing sales tax collected 1.14 In motor car manufacturing the following type of information is statutory a Decision on introducing a new model b Scheduling production c Assessing competitor car d Computing sales tax collected 1.15 In a hospital information system the following type of information is strategic a Opening a new children’s ward b Data on births and deaths c Preparing patients’ bill d Buying an expensive diagnostic system such as CAT scan 1.16 In a hospital information system the following type of information is tactical a Opening a new children’s’ ward b Data on births and deaths c Preparing patients’ bill d Buying an expensive diagnostic system such as CAT scan 1.17 In a hospital information system the following type of information is operational a Opening a new children’s’ ward b Data on births and deaths c Preparing patients’ bill d Buying an expensive diagnostic system such as CAT scan 1.18 In a hospital information system the following type of information is statutory a Opening a new children’s’ ward b Data on births and deaths c Preparing patients’ bill d Buying an expensive diagnostic system such as CAT scan
1.19 A computer based information system is needed because (i) The size of organization have become large and data is massive (ii) Timely decisions are to be taken based on available data (iii) Computers are available (iv) Difficult to get clerks to process data a (ii) and (iii) b (i) and (ii) c (i) and (iv) d (iii) and (iv) 1.20 Volume of strategic information is a Condensed b Detailed c Summarized d Irrelevant 1.21 Volume of tactical information is a Condensed b Detailed c Summarized d relevant 1.22 Volume of operational information is a Condensed b Detailed c Summarized d Irrelevant 1.23 Strategic information is a Haphazard b Well organized c Unstructured d Partly structured 1.24 Tactical information is a Haphazard
b Well organized c Unstructured d Partly structured 1.25 Operational information is a Haphazard b Well organized c Unstructured d Partly structured 1.26 Match and find best pairing for a Human Resource Management System (i)Policies on giving bonus (iv)Strategic information (ii)Absentee reduction (v)Tactical information (iii)Skills inventory (vi)Operational Information a (i) and (v) b (i) and (iv) c (ii) and (iv) d (iii) and (v) 1.27 Match and find best pairing for a Production Management System (i) Performance appraisal of machines (iv)Strategic information to decide on replacement (ii)Introducing new production (v)Tactical information technology (iii)Preventive maintenance schedules (vi)Operational information for machines a (i) and (vi) b (ii) and (v) c (i) and (v) d (iii) and (iv) 1.28 Match and find best pairing for a Production Management System (i) Performance appraisal of machines (iv)Strategic information to decide on replacement (ii)Introducing new production (v)Tactical information technology
(iii)Preventive maintenance schedules (vi)Operational information for machines a (iii) and (vi) b (i) and (iv) c (ii) and (v) d None of the above 1.29 Match and find best pairing for a Materials Management System (i) Developing vendor performance (iv) Strategic information measures (ii) Developing vendors for critical (v) Tactical information items (iii)List of items rejected from a vendor (vi)Operational information a (i) and (v) b (ii) and (v) c (iii) and (iv) d (ii) and (vi) 1.30 Match and find best pairing for a Materials Management System (i)Developing vendor performance (iv)Strategic information measures (ii)Developing vendors for critical (v)Tactical information items (iii)List of items rejected from a vendor (vi)Operational information a (i) and (iv) b (i) and (vi) c (ii) and (iv) d (iii) and (v) 1.31 Match and find best pairing for a Materials Management System (i)Developing vendor performance (iv)Strategic information measures (ii)Developing vendors for critical (v)Tactical information items (iii)List of items rejected from a vendor (vi)Operational information a (i) and (vi) b (iii) and (vi) c (ii) and (vi) d (iii) and (iv)
1.32 Match and find best pairing for a Finance Management System (i)Tax deduction at source report (iv)Strategic information (ii)Impact of taxation on pricing (v)Tactical information (iii)Tax planning (vi)Operational information a (i) and (v) b (iii) and (vi) c (ii) and (v) d (ii)) and (iv) 1.33 Match and find best pairing for a Finance Management System (i)Budget status to all managers (iv)Strategic information (ii)Method of financing (v)Tactical information (iii)Variance between budget and (vi)Operational information expenses a (i) and (v) b (iii) and (vi) c (ii) and (v) d (ii) and (iv) 1.34 Match and find best pairing for a Marketing Management System (i)Customer preferences surveys (iv)Strategic information (ii)Search for new markets (v)Tactical information (iii)Performance of sales outlets (vi)Operational information a (i) and (iv) b (ii) and (v) c (iii) and (vi) d (ii) and (v) 1.35 Match and find best pairing for a Marketing Management System (i)Customer preferences surveys (iv)Strategic information (ii)Search for new markets (v)Tactical information (iii)Performance of sales outlets (vi)Operational information a (iii) and (iv) b (i) and (vi) c (i) and (v)
d (iii) and (v) 1.36 Match and find best pairing for a Research and Development Management System (i)Technical collaboration decision (iv)Strategic information (ii)Budgeted expenses Vs actuals (v)Tactical information (iii)Proportion of budget to be (vi)Operational information allocated to various projects a (i) and (iv) b (ii) and (v) c (iii) and (vi) d (iii) and (iv) 1.37 Match and find best pairing for a Research and Development Management System (i)Technical collaboration decision (iv)Strategic information (ii)Budgeted expenses Vs actuals (v)Tactical information (iii)Proportion of budget to be (vi)Operational information allocated to various projects a (i) and (v) b (iii) and (v) c (ii) and (v) d (i) and (vi) 1.38 Organizations are divided into departments because a it is convenient to do so b each department can be assigned a specific functional responsibility c it provides opportunities for promotion d it is done by every organization 1.39 Organizations have hierarchical structures because a it is convenient to do so b it is done by every organization c specific responsibilities can be assigned for each level d it provides opportunities for promotions
1.40 Which of the following functions is the most unlikely in an insurance company. a Training b giving loans c bill of material d accounting 1.41 Which of the following functions is most unlikely in a university a admissions b accounting c conducting examination d marketing 1.42 Which of the following functions is most unlikely in a purchase section of an organization. a Production planning b order processing c vendor selection d training 1.43 Which is the most unlikely function of a marketing division of an organization. a advertising b sales analysis c order processing d customer preference analysis 1.44 Which is the most unlikely function of a finance section of a company. a Billing b costing c budgeting d labor deployment 1.45 Match quality of information and how it is ensured using the following list QUALITY HOW ENSURED (i) Accurate (iv) Include all data
(ii) Complete (v) Use correct input and processing rules (iii)Timely (vi) Include all data up to present time a (i) and (v) b (ii) and (vi) c (iii) and (vi) d (i) and (iv) 1.46 Match quality of information and how it is ensured using the following list QUALITY HOW ENSURED (i) Accurate (iv) Include all data (ii) Complete (v) Use correct input and processing rules (iii) Timely (vi) Include all data up to present time a (ii) and (v) b (ii) and (vi) c (ii) and (iv) d (iii) and (iv) 1.47 Match quality of information and how it is ensured using the following list
QUALITY HOW ENSURED (i) Up-to-date (iv) Include all data to present time (ii) Brief (v) Give at right time (iii) Significance (vi) Use attractive format and understandable graphical charts
a (i) and (v) b (ii) and (vi) c (iii) and (vi) d (i) and (vi) 1.48 Match quality of information and how it is ensured using the following list QUALITY HOW ENSURED (i)Up- to-date (iv) Include all data to present time (ii)Brief (v) Give at right time
(iii) Significance (vi) Use attractive format and understandable graphical charts a (i) and (iv) b (ii) and (v) c (iii) and (iv) d (ii) and (iv) 1.49 Match quality of information and how it is ensured using the following list QUALITY HOW ENSURED (i)Brief (iv) Unpleasant information not hidden (ii)Relevant (v) Summarize relevant information (iii) Trustworthy (vi) Understands user needs a (i) and (iv) b (ii) and (v) c (iii) and (vi) d (i) and (v) 1.50 Match quality of information and how it is ensured using the following list QUALITY HOW ENSURED (i)Brief (iv) Unpleasant information not hidden (ii)Relevant (v) Summarize relevant information (iii)Trustworthy (vi) Understands user needs a (ii) and (vi) b (i) and (iv) c (iii) and (v) d (ii) and (iv) 1.51 The quality of information which does not hide any unpleasant information is known as a Complete b Trustworthy c Relevant d None of the above 1.52 The quality of information which is based on understanding user needs
a Complete b Trustworthy c Relevant d None of the above 1.53 Every record stored in a Master file has a key field because a it is the most important field b it acts as a unique identification of record c it is the key to the database d it is a very concise field 1.54 The primary storage medium for storing archival data is a floppy disk b magnetic disk c magnetic tape d CD- ROM 1.55 Master files are normally stored in a a hard disk b a tape c CD – ROM d computer’s main memory 1.56 Master file is a file containing a all master records b all records relevant to the application c a collection of data items d historical data of relevance to the organization 1.57 Edit program is required to a authenticate data entered by an operator b format correctly input data c detect errors in input data d expedite retrieving input data 1.58 Data rejected by edit program are a corrected and re- entered
b removed from processing c collected for later use d ignored during processing 1.59 Online transaction processing is used because a it is efficient b disk is used for storing files c it can handle random queries. d Transactions occur in batches 1.60 On-line transaction processing is used when i) it is required to answer random queries ii) it is required to ensure correct processing iii) all files are available on-line iv) all files are stored using hard disk a i ,ii b i, iii c ii ,iii, iv d i , ii ,iii 1.61 Off-line data entry is preferable when i) data should be entered without error ii) the volume of data to be entered is large iii) the volume of data to be entered is small iv) data is to be processed periodically a i, ii b ii, iii c ii, iv d iii, iv 1.62 Batch processing is used when i) response time should be short ii) data processing is to be carried out at periodic intervals iii) transactions are in batches iv) transactions do not occur periodically
a i ,ii b i ,iii,iv c ii ,iii d i , ii ,iii 1.63 Batch processing is preferred over on-line transaction processing when i) processing efficiency is important ii) the volume of data to be processed is large iii) only periodic processing is needed iv) a large number of queries are to be processed a i ,ii b i, iii c ii ,iii d i , ii ,iii 1.64 A management information system is one which a is required by all managers of an organization b processes data to yield information of value in tactical management c provides operational information d allows better management of organizations 1.65 Data mining is used to aid in a operational management b analyzing past decision made by managers c detecting patterns in operational data d retrieving archival data 1.66 Data mining requires a large quantities of operational data stored over a period of time b lots of tactical data c several tape drives to store archival data d large mainframe computers 1.67 Data mining can not be done if a operational data has not been archived b earlier management decisions are not available
c the organization is large d all processing had been only batch processing 1.68 Decision support systems are used for a Management decision making b Providing tactical information to management c Providing strategic information to management d Better operation of an organization 1.69 Decision support systems are used by a Line managers. b Top-level managers. c Middle level managers. d System users 1.70 Decision support systems are essential for a Day–to-day operation of an organization. b Providing statutory information. c Top level strategic decision making. d Ensuring that organizations are profitable.
Key to Objective Questions
1.1 b 1.2 b 1.3 c 1.4 c 1.5 c 1.6 d
1.7 a 1.8 a 1.9 b 1.10 b 1.11 a 1.12 c
1.13 b 1.14 d 1.15 d 1.16 a 1.17 c 1.18 b
1.19 b 1.20 a 1.21 c 1.22 b 1.23 c 1.24 d
1.25 b 1.26 b 1.27 c 1.28 a 1.29 a 1.30 c
1.31 b 1.32 c 1.33 d 1.34 c 1.35 c 1.36 a
1.37 b 1.38 b 1.39 c 1.40 c 1.41 d 1.42 a
1.43 c 1.44 d 1.45 a 1.46 c 1.47 c 1.48 a
1.49 d 1.50 a 1.51 b 1.52 c 1.53 b 1.54 c
1.55 a 1.56 b 1.57 c 1.58 a 1.59 c 1.60 b
1.61 c 1.62 c 1.63 d 1.64 b 1.65 c 1.66 a
1.67 a 1.68 c 1.69 b 1.70 c
Comments
Post a Comment