Data Analytics 13000+ MCQs Question

Data Analytics MCQ 15000+

Data Analytics

Unit 1:

1. Data Analysis is a process of?

A. inspecting data B. cleaning data C. transforming data D. All of the above

2. Which of the following is not a major data analysis approaches?

A. Data Mining B. Predictive Intelligence C. Business Intelligence D. Text Analytics

3. How many main statistical methodologies are used in data analysis?

A. 2 B. 3 C. 4 D. 5

4. In descriptive statistics, data from the entire population or a sample is summarized with ?

A. integer descriptors B. floating descriptors C. numerical descriptors D. decimal descriptors View Answer

5. Data Analysis is defined by the statistician?

A. William S. B. Hans Peter Luhn C. Gregory Piatetsky-Shapiro D. John Tukey

6. Which of the following is true about hypothesis testing?

A. answering yes/no questions about the data B. estimating numerical characteristics of the data C. describing associations within the data D. modeling relationships within the data

7. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

A. TRUE B. FALSE C. Can be true or false D. Can not say

8. The branch of statistics which deals with development of particular statistical methods is classified as

A. industry statistics B. economic statistics C. applied statistics D. applied statistics

9. Which of the following is true about regression analysis?

A. answering yes/no questions about the data B. estimating numerical characteristics of the data C. modeling relationships within the data D. describing associations within the data

10. Text Analytics, also referred to as Text Mining?

A. TRUE B. FALSE C. Can be true or false D. Can not say

11. In an Internet context, this is the practice of tailoring Web pages to individual users’ characteristics or preferences. 1. Web services 2. customer-facing 3. client/server 4. personalization

12. This is the processing of data about customers and their relationship with the enterprise in order to improve the enterprise’s future sales and service and lower cost. 1. clickstream analysis 2. database marketing 3. customer relationship management 4. CRM analytics

13. This is a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions. 1. best practice 2. data mart

3. business information warehouse 4. business intelligence

14. This is a systematic approach to the gathering, consolidation, and processing of consumer data (both for customers and potential customers) that is maintained in a company’s databases 1. database marketing 2. marketing encyclopedia 3. application integration 4. service oriented integration

15. This is an arrangement in which a company outsources some or all of its customer relationship management functions to an application service provider (ASP). 1. spend management 2. supplier relationship management 3. hosted CRM 4. Customer Information Control System

16. What are the five V’s of Big Data? 1. Volume 2. velocity 3. Variety 4. All of the above

17. ____ hides the limitations of Java behind a powerful and concise Clojure API for Cascading.” 1. Scalding 2. Cascalog 3. Hcatalog 4. Hcalding

18. What are the main components of Big Data? 1. MapReduce 2. HDFS 3. YARN

4. All of these

19. What are the different features of Big Data Analytics? 1. Open-Source 2. Scalability 3. Data Recovery 4. All the above

20. Define the Port Numbers for NameNode, Task Tracker and Job Tracker 1. NameNode 2. Task Tracker 3. Job Tracker 4. All of the above

21. Facebook Tackles Big Data With ____ based on Hadoop 1. Project Prism 2. Prism 3. ProjectData 4. ProjectBid

22. Which of the following is not a phase of Data Analytics Life Cycle? 1. Communication 2. Recall 3. Data Preparation 4. Model Planning

UNIT 2: DATA ANALYSIS

1. In regression, the equation that describes how the response variable (y) is related to the explanatory variable (x) is: a. the correlation model b. the regression model c. used to compute the correlation coefficient d. None of these alternatives is correct.

2. The relationship between number of beers consumed (x) and blood alcohol content (y) was s tu died in 16 male college students by using least squares regression. The following regression equation was obtained from this study: != -0.0127 + 0.0180x The above equation implies that:

a. each beer consumed increases blood alcohol by 1.27%

b. on average it takes 1.8 beers to increase blood alcohol content by 1%

c. each beer consumed increases blood alcohol by an average of amount of 1.8%

d. each beer consumed increases blood alcohol by exactly 0.018

3 . SSE can never be

a. larger than SST

b. smaller than SST

c. equal to 1

d. equal to zero

4. Regression modeling is a statistical framework for developing a mathematical equation that describes how

a. one explanatory and one or more response variables are related

b. several explanatory and several response variables response are related

c. one response and one or more explanatory variables are related

d. All of these are correct.

5. In regression analysis, the variable that is being predicted is the

a. response, or dependent, variable

b. independent variable

c. intervening variable

d. is usually x

6 Regression analysis was applied to return rates of sparrowhawk colonies. Regression analysis was used to study the relationship between return rate (x: % of birds that return to the colony in a given year) and immigration rate (y: % of new adults that join the colony per year). The following regression equation

was obtained. ! = 31.9 – 0.34x Based on the above estimated regression equation, if the return rate were to decrease by 10% the rate of immigration to the colony would:

a. increase by 34%

b. increase by 3.4%

c. decrease by 0.34%

d. decrease by 3.4%

7. In least squares regression, which of the following is not a required assumption about the error term ε?

a. The expected value of the error term is one.

b. The variance of the error term is the same for all values of x.

c. The values of the error term are independent.

d. The error term is normally distributed.

8. Larger values of r 2 (R2 ) imply that the observations are more closely grouped about the

a. average value of the independent variables

b. average value of the dependent variable

c. least squares line

d. origin

9. In a regression analysis if r 2 = 1, then

a. SSE must also be equal to one

b. SSE must be equal to zero

c. SSE can be any positive value

d. SSE must be negative

10.Which type of multivariate analysis should be used when a researcher wants to reduce a Set of variables to a smaller set of composite variables by identifying underlying dimensions of the data?

A)Conjoint analysis

B)Cluster analysis

C)Multiple regression analysis

D)Factor analysis

11. Which type of multivariate analysis should be used when a researcher wants to estimate The utility that consumers associate with different product features?

A)Conjoint analysis

B)Cluster analysis\ A

C)Multiple regression analysis

D)Factor analysis

12. Which type of multivariate analysis should be used when a researcher wants to identify Subgroups of individuals that are homogeneous within subgroups and different from other subgroups?

A)Conjoint analysis

B)Cluster analysis

C)Multiple regression analysis

D)Factor analysis

13. Which type of multivariate analysis should be used when a researcher wants predict Group membership on the basis of two or more independent variables?

A)Conjoint analysis

B)Cluster analysis

C)Multiple regression analysis

D)Multiple discriminant analysis

14. Support vector machine (SVM) is a _________ classifier? Discriminative

Generative

15. SVM can be used to solve ___________ problems. Classification

Regression

Clustering

Both Classification and Regression

16. SVM is a ___________ learning algorithm Supervised

Unsupervised

17. SVM is termed as ________ classifier Minimum margin

Maximum margin

18. The training examples closest to the separating hyperplane are called as _______ Training vectors

Test vectors

19. A factor analysis is…, while a principal components analysis is…

A broad term, the most commonly used technique for doing factor analysis.

B The most commonly used technique for doing factor analysis, a broad term.

C Both of the above.

D NONE OF THE ABOVE

20. Dimension Reduction is defined as-

• A It is a process of converting a data set having vast dimensions into a data set with lesser dimensions. • B It ensures that the converted data set conveys similar information concisely. C ALL OF ABOVE

D NONE OF THE ABOVE

21.. What is the form of Fuzzy logic? a) Two-valued logic b) Crisp set logic c) Many-valued logic d) Binary set logic

22. Traditional set theory is also known as Crisp Set theory. a) True b) False

23. The truth values of traditional set theory is ____________ and that of fuzzy set is __________ a) Either 0 or 1, between 0 & 1 b) Between 0 & 1, either 0 or 1 c) Between 0 & 1, between 0 & 1 d) Either 0 or 1, either 0 or 1

24. Fuzzy logic is extension of Crisp set with an extension of handling the concept of Partial Truth. a) True b) False

25. The room temperature is hot. Here the hot (use of linguistic variable is used) can be represented by _______ a) Fuzzy Set b) Crisp Set c) Fuzzy & Crisp Set

d) None of the mentioned

26. The values of the set membership is represented by ___________ a) Discrete Set b) Degree of truth c) Probabilities d) Both Degree of truth & Probabilities

27. Japanese were the first to utilize fuzzy logic practically on high-speed trains in Sendai. a) True b) False

28. Fuzzy Set theory defines fuzzy operators. Choose the fuzzy operators from the following. a) AND b) OR c) NOT d) All of the mentioned

29. There are also other operators, more linguistic in nature, called __________ that can be applied to fuzzy set theory. a) Hedges b) Lingual Variable c) Fuzz Variable d) None of the mentioned

30. Fuzzy logic is usually represented as ___________ a) IF-THEN-ELSE rules b) IF-THEN rules c) Both IF-THEN-ELSE rules & IF-THEN rules d) None of the mentioned

31. Like relational databases there does exists fuzzy relational databases. a) True b) False

32. ______________ is/are the way/s to represent uncertainty. a) Fuzzy Logic b) Probability c) Entropy d) All of the mentioned

33. ____________ are algorithms that learn from their more complex environments (hence eco) to generalize, approximate and simplify solution logic. a) Fuzzy Relational DB b) Ecorithms c) Fuzzy Set d) None of the mentioned

Unit 3:

1 : What do you mean by sampling of stream data?

1. Sampling reduces the amount of data fed to a subsequent data mining algorithm. 2. Sampling reduces the diversity of the data stream 3. Sampling aims to keep statistical properties of the data intact. 4. Sampling algorithms often doesn't need multiple passes over the data

Question 2 : if Distance measure d(x, y)= d(y, x) then it is called

1. Symmetric 2. identical 3. positiveness 4. triangle inequality

Question 3 : NOSQL is

1. Not only SQL 2. Not SQL 3. Not Over SQL 4. No SQL

Question 4 : Find the L1 and L2 distances between the points (5, 6, 7) and (8, 2, 4).

1. L1 =10 , L2 = 5.83 2. L1 =10 , L2 = 5 3. L1 =11 , L2 = 4.9

4. L1 =9 , L2 = 5.83

Question 5 : The time between elements of one stream

1. need not be uniform 2. need to be uniform 3. must be 1ms. 4. must be 1ns

Question 6 : A Reduce task receives

1. one or more keys and their associated value list 2. key value pair 3. list of keys and their associated values 4. list of key value pairs

Question 7 : Which of the following statements about data streaming is true?

1. Stream data is always unstructured data. 2. Stream data often has a high velocity. 3. Stream elements cannot be stored on disk. 4. Stream data is always structured data.

Question 8 : Hadoop is the solution for:

1. Database software 2. Big Data Software 3. Data Mining software 4. Distribution software

Question 9 : ETL stands for ________________

1. Extraction transformation and loading 2. Extract Taken Lend 3. Enterprise Transfer Load 4. Entertainment Transference Load

Question 10 : “Sharding” a database across many server instances can be achieved with _______________

1. MAN 2. LAN 3. WAN 4. SAN

Question 11 : Neo4j is an example of which of the following NoSQL architectural pattern?

1. Key-value store 2. Graph Store 3. Document Store 4. Column-based Store

Question 12 : CSV and JSON can be described as

1. Structured data 2. Unstructured data 3. Semi-structured data 4. Multi-structured data

Question 13 : The hardware term used to describe Hadoop hardware requirements is

1. Commodity firmware

2. Commodity software 3. Commodity hardware 4. Cluster hardware

Question 14 : Which of the following is not a Hadoop Distributions?

1. MAPR 2. Cloudera 3. Hortonworks 4. RMAP

Question 15 : Which of the following Operation can be implemented with Combiners?

1. Selection 2. Projection 3. Natural Join 4. Union

Question 16 : ________ stores are used to store information about networks, such as social connections.

1. Key-value 2. Wide-column 3. Document 4. graph

Question 17 : The DGIM algorithm was developed to estimate the counts of 1's occur within the last k bits of a stream window N. Which of the following statements is true about the estimate of the number of 0's based on DGIM?

1. The number of 0's cannot be estimated at all. 2. The number of 0's can be estimated with a maximum guaranteed error

3. To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's. 4. Determine whether an element has already occurred in previous stream data.

Question 18 : If size of file is 4 GB and block size is 64 MB then number of mappers required for MapReduce task is

1. 8 2. 16 3. 32 4. 64

Question 19 : Which of the following is not the default daemon of Hadoop?

1. Namenode 2. Datanode 3. Job Tracker 4. Job history server

Question 20 : In Bloom filter an array of n bits is initialized with

1. all 0s 2. all 1s 3. half 0s and half 1s 4. all -1

Question 21 : _____________is a batch-based, distributed computing framework modeled after Google’s paper.

1. MapCompute 2. MapReuse 3. MapCluster

4. MapReduce

Question 22 : What is the edit distance between A=father and B=feather ?

1. 5 2. 1 3. 4 4. 2

Question 23 : Sliding window operations typically fall in the category

1. OLTP Transactions 2. Big Data Batch Processing 3. Big Data Real Time Processing 4. Small Batch Processing

Question 24 : _________ systems focus on the relationship between users and items for recommendation.

1. DGIM 2. Collaborative-Filtering 3. Content Based and Collaborative Filtering 4. Content Based

Question 25 : Find Hamming Distance for vectors A=100101011 B=100010010

1. 2 2. 4 3. 3 4. 1

Question 26 : During start up, the ___________ loads the file system state from the fsimage and the edits log file.

1. Datanode 2. Namenode 3. Secondary Namenode 4. Rack awereness policy

Question 27 : What is the finally produced by Hierarchical Agglomerative Clustering?

1. final estimate of cluster centroids 2. assignment of each point to clusters 3. tree showing how close things are to each other 4. Group of clusters

Question 28 : The Jaccard similarity of two non-binary sets A and B, is defined by__________

1. Jaccard Index 2. Primary Index 3. Secondary Index 4. Clustered Index

Question 29 : Following is based on grid like street geography of the New York:

1. Manhattan Distance 2. Edit Distance 3. Hamming distance 4. Lp distance

Question 30 : The FM-sketch algorithm can be used to:

1. Estimate the number of distinct elements. 2. Sample data with a time-sensitive window. 3. Estimate the frequent elements. 4. Determine whether an element has already occurred in previous stream data.

31 : Pick a hash function h that maps each of the N elements to at least log2 N bits, Estimated number of distinct elements is

1. 2^R 2. 2^(-R) 3. 1-(2^R) 4. 1-(2^(-R))

Question 32 : which of the following is not the characterstic of stream data?

1. Continuous 2. ordered 3. persistant 4. huge

Question 33 : Which of the following is a column-oriented database that runs on top of HDFS

1. Hive 2. Sqoop 3. Hbase 4. Flume

Question 34 : Which of the following decides the number of partitions that are created on the local file system of the worker nodes?

1. Number of map tasks 2. Number of reduce tasks 3. Number of file input splits 4. Number of distinct keys in the intermediate key-value pairs

Question 35 : Which of the following is not the class of points in BFR algorithm

1. Discard Set (DS) 2. Compression Set (CS) 3. Isolation Set (IS) 4. Retained Set (RS)

Question 36 : Which of the following is not true for 5v?

1. Volume 2. variable 3. Velocity 4. value

Question 37 : Which algorithm isused to find fully connected subgraph in soial media mining?

1. CURE 2. CPM 3. SimRank 4. Girvan-Newman Algorithm

Question 38 : A ________________ query Q is a query that is issued once over a database D, and then logically runs continuously over the data in D until Q is terminated.

1. One-time Query 2. Standing Query 3. Adhoc Query

4. General Query

Question 39 : Effect of Spider trap on page rank

1. perticular page get the highest page rank 2. All the pages of web will get 0 page rank 3. no effect on any page 4. affects a perticular set of pages

Question 40 : Which of the following is correct option for MongoDB

1. MongoDB is column oriented data store 2. MongoDB uses XML more in comparison with JSON 3. MongoDB is a document store database 4. MongoDB is a key-value data store

Question 41 : _________ systems focus on the relationship between users and items for recommendation.

1. DGIM 2. Collaborative-Filtering 3. Content Based and Collaborative Filtering 4. Content Based

Question 42 : The graphical representation of an SNA is made up of links and _____________.

1. People 2. Networks 3. Nodes 4. Computers

Question 43 : Hadoop is a framework that works with a variety of related tools. Common hadoop ecosystem include ____________

1. MapReduce, Hummer and Iguana 2. MapReduce, Hive and HBase 3. MapReduce, MySQL and Google Apps 4. MapReduce, Heron and Trumpet

Question 44 : About data streaming, Which of the following statements is true?

1. Stream data is always unstructured data. 2. Stream data often has a high velocity. 3. Stream elements cannot be stored on disk. 4. Stream data is always structured data.

Question 45 : Which of the following is a NoSQL Database Type ?

1. SQL 2. JSON 3. Document databases 4. CSV

Question 46 : Techniques for fooling search engines into believing your page is about something it is not, are called _____________.

1. term spam 2. page rank 3. phishing 4. dead ends

Question 47 : The police set up checkpoints at randomly selected road locations, then inspected every driver at those locations. What type of sample is this?

1. Simple Random Sample 2. Startified Random Sample 3. Cluster Random Sample 4. Uniform sampling

Question 48 : Which of the following statements about standard Bloom filters is correct?

1. It is possible to delete an element from a Bloom filter. 2. A Bloom filter always returns the correct result. 3. It is possible to alter the hash functions of a full Bloom filter to create more space. 4. A Bloom filter always returns TRUE when testing for a previously added element.

Question 49 : Which of the following is responsible for managing the cluster resources and use them for scheduling users’ applications?

1. Hadoop Common 2. YARN 3. HDFS 4. MapReduce

Question 50 : ___________is related with an inconsistency possessed by data and this in turn hampers the data analization process or creates hurdle in the way for those wish to analyze this form of data.

1. Variability 2. Variety 3. Volume 4. Complexity

Unit 4:

Question 1 This clustering algorithm terminates when mean values computed for the current iteration of the algorithm are identical to the computed mean values for the previous iteration Select one:

a. K-Means clustering

b. conceptual clustering

c. expectation maximization

d. agglomerative clustering Show Answer

Question 2 This clustering approach initially assumes that each data instance represents a single cluster. Select one:

a. expectation maximization

b. K-Means clustering

c. agglomerative clustering

d. conceptual clustering Show Answer

Question 3 The correlation coefficient for two real-valued attributes is – 0.85. What does this value tell you? Select one:

a. The attributes are not linearly related.

b. As the value of one attribute decreases the value of the second attribute increases.

c. As the value of one attribute increases the value of the second attribute also increases.

d. The attributes show a linear relationship Show Answer

Question 4 Time Complexity of k-means is given by Select one:

a. O(mn)

b. O(tkn)

c. O(kn)

d. O(t2kn) Show Answer

Question 5 Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that Select one:

a. Y is false when X is known to be false.

b. Y is true when X is known to be true.

c. X is true when Y is known to be true

d. X is false when Y is known to be false.

Question 6 Chameleon is Select one:

a. Density based clustering algorithm

b. Partitioning based algorithm

c. Model based algorithm

d. Hierarchical clustering algorithm

Question 7 In _________ clusterings, points may belong to multiple clusters Select one:

a. Non exclusivce

b. Partial

c. Fuzzy

d. Exclusive Show Answer

Question 8 Find odd man out Select one:

a. DBSCAN

b. K mean

c. PAM

d. K medoid

Question 9 Which statement is true about the K-Means algorithm? Select one:

a. The output attribute must be cateogrical.

b. All attribute values must be categorical.

c. All attributes must be numeric

d. Attribute values may be either categorical or numeric

Question 10 This data transformation technique works well when minimum and maximum values for a real-valued attribute are known. Select one:

a. z-score normalization

b. min-max normalization

c. logarithmic normalization

d. decimal scaling

Question 11 The number of iterations in apriori ___________ Select one:

a. increases with the size of the data

b. decreases with the increase in size of the data

c. increases with the size of the maximum frequent set

d. decreases with increase in size of the maximum frequent set Show Answer

Question 12 Which of the following are interestingness measures for association rules? Select one:

a. recall

b. lift

c. accuracy

d. compactness Show Answer

Question 13 Which one of the following is not a major strength of the neural network approach? Select one

: a. Neural network learning algorithms are guaranteed to converge to an optimal solution

b. Neural networks work well with datasets containing noisy data.

c. Neural networks can be used for both supervised learning and unsupervised clustering

d. Neural networks can be used for applications that require a time element to be included in the data Show Answer

Question 14 Find odd man out Select one:

a. K medoid

b. K mean

c. DBSCAN

d. PAM

Question 15 Given a frequent itemset L, If |L| = k, then there are Select one:

a. 2k – 1 candidate association rules

b. 2k candidate association rules

c. 2k – 2 candidate association rules

d. 2k -2 candidate association rules Show Answer

Question 16 . _________ is an example for case based-learning Select one:

a. Decision trees

b. Neural networks

c. Genetic algorithm

d. K-nearest neighbor Show Answer

Question 17 The average positive difference between computed and desired outcome values. Select one:

a. mean positive error

b. mean squared error

c. mean absolute error

d. root mean squared error Show Answer

Question 18 Frequent item sets is Select one:

a. Superset of only closed frequent item sets

b. Superset of only maximal frequent item sets

c. Subset of maximal frequent item sets

d. Superset of both closed frequent item sets and maximal frequent item sets Show Answer

Question 19 1. Assume that we have a dataset containing information about 200 individuals. A supervised data mining session has discovered the following rule: IF age < 30 & credit card insurance = yes THEN life insurance = yes Rule Accuracy: 70% and Rule Coverage: 63% How many individuals in the class life insurance= no have credit card insurance and are less than 30 years old? Select one:

a. 63

b. 30

c. 38

d. 70 Show Answer

Question 20 Use the three-class confusion matrix below to answer percent of the instances were correctly classified? Computed Decision Class 1 Class 2 Class 3 Class 1 10 5 3 Class 2 5 15 3 Class 3 2 2 5 Select one:

a. 60

b. 40

c. 50

d. 30 Show Answer

Question 21 Which of the following is cluster analysis? Select one:

a. Simple segmentation

b. Grouping similar objects

c. Labeled classification

d. Query results grouping Show Answer

Question 22 A good clustering method will produce high quality clusters with Select one:

a. high inter class similarity

b. low intra class similarity

c. high intra class similarity

d. no inter class similarity Show Answer

Question 23 Which two parameters are needed for DBSCAN Select one:

a. Min threshold

b. Min points and eps

c. Min sup and min confidence

d. Number of centroids Show Answer

Question 24 Which statement is true about neural network and linear regression models? Select one:

a. Both techniques build models whose output is determined by a linear sum of weighted input attribute values.

b. The output of both models is a categorical attribute value.

c. Both models require numeric attributes to range between 0 and 1.

d. Both models require input attributes to be numeric. Show Answer

Question 25 In Apriori algorithm, if 1 item-sets are 100, then the number of candidate 2 item-sets are Select one:

a. 100

b. 4950

c. 200

d. 5000

Show Answer

Question 26 Significant Bottleneck in the Apriori algorithm is Select one:

a. Finding frequent itemsets

b. Pruning

c. Candidate generation

d. Number of iterations Show Answer

Question 27 The concept of core, border and noise points fall into this category? Select one:

a. DENCLUE

b. Subspace clustering

c. Grid based

d. DBSCAN Show Answer

Question 28 The correlation coefficient for two real-valued attributes is â€“0.85. What does this value tell you? Select one:

a. The attributes show a linear relationship

b. The attributes are not linearly related.

c. As the value of one attribute increases the value of the second attribute also increases.

d. As the value of one attribute decreases the value of the second attribute increases. Show Answer

Question 29 Machine learning techniques differ from statistical techniques in that machine learning methods Select one:

a. are better able to deal with missing and noisy data

b. typically assume an underlying distribution for the data

c. have trouble with large-sized datasets

d. are not able to explain their behavior. Show Answer

Question 30 The probability of a hypothesis before the presentation of evidence. Select one:

a. a priori

b. posterior

c. conditional

d. subjective Show Answer

Question 31 KDD represents extraction of Select one:

a. data

b. knowledge

c. rules

d. model Show Answer

Question 32 Which statement about outliers is true? Select one

: a. Outliers should be part of the training dataset but should not be present in the test data.

b. Outliers should be identified and removed from a dataset

. c. The nature of the problem determines how outliers are used

d. Outliers should be part of the test dataset but should not be present in the training data. Show Answer

Question 33 The most general form of distance is Select one:

a. Manhattan

b. Eucledian

c. Mean

d. Minkowski Show Answer

Question 34 Arbitrary shaped clusters can be found by using Select one:

a. Density methods

b. Partitional methods

c. Hierarchical methods

d. Agglomerative Show Answer

Question 35 Which Association Rule would you prefer Select one

: a. High support and medium confidence

b. High support and low confidence

c. Low support and high confidence

d. Low support and low confidence Show Answer

Question 36 With Bayes theorem the probability of hypothesis HÂ¾ specified by P(H) Â¾ is referred to as Select one:

a. a conditional probability

b. an a priori probability

c. a bidirectional probability

d. a posterior probability Show Answer

Question 37 In a Rule based classifier, If there is a rule for each combination of attribute values, what do you called that rule set R Select one:

a. Exhaustive

b. Inclusive

c. Comprehensive

d. Mutually exclusive Show Answer

Question 38 The apriori property means Select one

: a. If a set cannot pass a test, its supersets will also fail the same test

b. To decrease the efficiency, do level-wise generation of frequent item sets

c. To improve the efficiency, do level-wise generation of frequent item sets

d. If a set can pass a test, its supersets will fail the same test Show Answer

Question 39 If an item set ‘XYZ’ is a frequent item set, then all subsets of that frequent item set are Select one:

a. Undefined

b. Not frequent

c. Frequent

d. Can not say Show Answer

Question 40 Clustering is ___________ and is example of ____________learning Select one:

a. Predictive and supervised

b. Predictive and unsupervised

c. Descriptive and supervised

d. Descriptive and unsupervised Show Answer

Question 41 The probability that a person owns a sports car given that they subscribe to automotive magazine is 40%. We also know that 3% of the adult population subscribes to automotive magazine. The probability of a person owning a sports car given that they donâ€™t subscribe to automotive magazine is 30%. Use this information to compute the probability that a person subscribes to automotive magazine given that they own a sports car Select one

: a. 0.0368

b. 0.0396

c. 0.0389

d. 0.0398 Show Answer

Question 42 Simple regression assumes a __________ relationship between the input attribute and output attribute. Select one:

a. quadratic

b. inverse

c. linear

d. reciprocal Show Answer

Question 43 Which of the following algorithm comes under the classification Select one:

a. Apriori

b. Brute force

c. DBSCAN

d. K-nearest neighbor Show Answer

Question 44 Hierarchical agglomerative clustering is typically visualized as? Select one:

a. Dendrogram

b. Binary trees

c. Block diagram

d. Graph Show Answer Question

45 The _______ step eliminates the extensions of (k-1)-itemsets which are not found to be frequent,from being considered for counting support Select one:

a. Partitioning

b. Candidate generation

c. Itemset eliminations

d. Pruning Show Answer

Question 46 To determine association rules from frequent item sets Select one:

a. Only minimum confidence needed

b. Neither support not confidence needed

c. Both minimum support and confidence are needed

d. Minimum support is needed Show Answer

Question 47 What is the final resultant cluster size in Divisive algorithm, which is one of the hierarchical clustering approaches? Select one:

a. Zero

b. Three

c. singleton

d. Two Show Answer

Question 48 If {A,B,C,D} is a frequent itemset, candidate rules which is not possible is Select one:

a. C –> A

b. D –>ABCD

c. A –> BC

d. B –> ADC Show Answer

Question 49 Which Association Rule would you prefer Select one:

a. High support and low confidence

b. Low support and high confidence

c. Low support and low confidence

d. High support and medium confidence Show Answer

Question 50 The probability that a person owns a sports car given that they subscribe to automotive magazine is 40%. We also know that 3% of the adult population subscribes to automotive magazine. The probability of a person owning a sports car given that they don’t subscribe to automotive magazine is 30%. Use this information to compute the probability that a person subscribes to automotive magazine given that they own a sports car

Select one:

a. 0.0398

b. 0.0389

c. 0.0368

d. 0.0396 Show Answer

Unit 5:

1. What is true about Data Visualization?

A. Data Visualization is used to communicate information clearly and efficiently to users by the usage of information graphics such as tables and charts. B. Data Visualization helps users in analyzing a large amount of data in a simpler way. C. Data Visualization makes complex data more accessible, understandable, and usable. D. All of the above

2. Data can be visualized using?

A. graphs B. charts C. maps D. All of the above

3. Data visualization is also an element of the broader _____________.

A. deliver presentation architecture B. data presentation architecture C. dataset presentation architecture D. data process architecture

4. Which method shows hierarchical data in a nested format?

A. Treemaps B. Scatter plots C. Population pyramids D. Area charts

5. Which is used to inference for 1 proportion using normal approx?

A. fisher.test() B. chisq.test() C. Lm.test() D. prop.test()

6. Which is used to find the factor congruence coefficients?

A. factor.mosaicplot B. factor.xyplot C. factor.congruence D. factor.cumsum

7. Which of the following is tool for checking normality?

A. qqline() B. qline() C. anova() D. lm()

8. Which of the following is false?

A. data visualization include the ability to absorb information quickly B. Data visualization is another form of visual art C. Data visualization decrease the insights and take solwer decisions D. None Of the above

9. Common use cases for data visualization include?

A. Politics B. Sales and marketing C. Healthcare D. All of the above

10. Which of the following plots are often used for checking randomness in time series?

A. Autocausation B. Autorank C. Autocorrelation D. None of the above

11. Which are pros of data visualization?

A. It can be accessed quickly by a wider audience. B. It can misrepresent information C. It can be distracting D. None Of the above

12. Which are cons of data visualization?

A. It conveys a lot of information in a small space. B. It makes your report more visually appealing.

C. visual data is distorted or excessively used. D. None Of the above

13. Which of the intricate techniques is not used for data visualization?

A. Bullet Graphs B. Bubble Clouds C. Fever Maps D. Heat Maps

14. Which one of the following is most basic and commonly used techniques?

A. Line charts B. Scatter plots C. Population pyramids D. Area charts

15. Which is used to query and edit graphical settings?

A. anova() B. par() C. plot() D. cum()

16. Which of the following method make vector of repeated values?

A. rep() B. data() C. view() D. read()

17. Who calls the lower level functions lm.fit?

A. lm() B. col.max

C. par D. histo

18. Which of the following lists names of variables in a data.frame?

A. par() B. names() C. barchart() D. quantile()

19. Which of the folllowing statement is true?

A. Scientific visualization, sometimes referred to in shorthand as SciVis B. Healthcare professionals frequently use choropleth maps to visualize important health data. C. Candlestick charts are used as trading tools and help finance professionals analyze price movements over time D. All of the above

20. ________is used for density plots?

A. par B. lm C. kde D. C

Answer key:

Unit :1

Ans : D

Explanation: Data Analysis is a process of inspecting, cleaning, transforming and modelling data with the goal of discovering useful information, suggesting conclusions and supporting decision-making.

2. Ans : B

Explanation: Predictive Analytics is major data analysis approaches not Predictive Intelligence.

3. Ans : A

Explanation: In data analysis, two main statistical methodologies are used Descriptive statistics and Inferential statistics.

4. Ans : C

Explanation: In descriptive statistics, data from the entire population or a sample is summarized with numerical descriptors.

5. Ans : D

Explanation: Data Analysis is defined by the statistician John Tukey in 1961 as "Procedures for analyzing data.

6. Ans : A

Explanation: answering yes/no questions about the data (hypothesis testing)

7. Ans : A

Explanation: The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

8. Ans : D

Explanation: The branch of statistics which deals with development of particular statistical methods is classified as applied statistics.

Ans : C

Explanation: modeling relationships within the data (E.g. regression analysis).

10 Ans : A

Explanation: Text Data Mining is the process of deriving high-quality information from text.

11 personalization

12.

CRM analytics

13.

Advertisement

business intelligence

14. database marketing

15. hosted CRM

16. All of the above

17. Cascalog

18. All of these

19. All the above

20. All of the above

21.

Project Prism

22.

Recall

UNIT 2:

1. b

2. c

3. A

4. c

5. a

6. b

7. a

8. c

9. B

10. D

11. A

12. B

13. D

14.A

15. D

16. A

17. B

18. C

19. A broad term, the most commonly used technique for doing factor analysis.

20. C

21. Answer: c Explanation: With fuzzy logic set membership is defined by certain value. Hence it could have many values to be in the set.

22. Answer: a Explanation: Traditional set theory set membership is fixed or exact either the member is in the set or not. There is only two crisp values true or false. In case of fuzzy logic there are many values. With weight say x the member is in the set. 23. Answer: a Explanation: Refer the definition of Fuzzy set and Crisp set. 24. Answer: a Explanation: None. 25. Answer: a Explanation: Fuzzy logic deals with linguistic variables. 26. Answer: b Explanation: Both Probabilities and degree of truth ranges between 0 – 1. 27. Answer: a Explanation: None. 28. Answer: d Explanation: The AND, OR, and NOT operators of Boolean logic exist in fuzzy logic, usually defined as the minimum, maximum, and complement; 29. Answer: a Explanation: None.

30. Answer: b Explanation: Fuzzy set theory defines fuzzy operators on fuzzy sets. The problem in applying this is that the appropriate fuzzy operator may not be known. For this reason, fuzzy logic usually uses IF-THEN rules, or constructs that are equivalent, such as fuzzy associative matrices. Rules are usually expressed in the form: IF variable IS property THEN action 31. Answer: a Explanation: Once fuzzy relations are defined, it is possible to develop fuzzy relational databases. The first fuzzy relational database, FRDB, appeared in Maria Zemankova dissertation. 32. Answer: d Explanation: Entropy is amount of uncertainty involved in data. Represented by H(data). 33. Answer: c Explanation: Local structure is usually associated with linear rather than exponential growth in complexity.

Unit 4:

1. Feedback: K-Means clustering 2. Feedback: 3. As the value of one attribute decreases the value of the second attribute increases. 4. O(tkn) 5. Y is true when X is known to be true 6. Hierarchical clustering algorithm

7. Fuzzy 8. dbscan 9. All attributes must be numeric 10. : min-max normalization 11. increases with the size of the maximum frequent set 12. : lift 13. Neural network learning algorithms are guaranteed to converge to an optimal solution 14. DBSCAN 15. 2k -2 candidate association rules 16. : K-nearest neighbor 17. mean absolute error 18. Superset of both closed frequent item sets and maximal frequent item sets 19. 38 20. 60 21. Grouping similar objects 22. high intra class similarity 23. Min points and eps 24. Both models require input attributes to be numeric. 25. 4950 26. Candidate generation 27. DBSCAN 28. As the value of one attribute decreases the value of the second attribute increases. 29. are better able to deal with missing and noisy data 30. a priori 31. knowledge 32. The nature of the problem determines how outliers are used 33. Minkowski 34. Density methods 35. Low support and high confidence 36. an a priori probability 37. Exhaustive 38. If a set cannot pass a test, its supersets will also fail the same test 39. Frequent 40. Descriptive and unsupervised 41. 0.0396 42. linear 43. K-nearest neighbor 44. Dendrogram 45. Pruning 46. Both minimum support and confidence are needed 47. singleton 48. D –>ABCD 49. Low support and high confidence 50. Answer Feedback: 0.0396

Unit 5:

1. Ans : D

Explanation: Data Visualization is used to communicate information clearly and efficiently to users by the usage of information graphics such as tables and charts. It helps users in analyzing a large amount of data in a simpler way. It makes complex data more accessible, understandable, and usable.

Ans : D

Explanation: Data visualization is a graphical representation of quantitative information and data by using visual elements like graphs, charts, and maps.

3. Ans : B

Explanation: Data visualization is also an element of the broader data presentation architecture (DPA) discipline, which aims to identify, locate, manipulate, format and deliver data in the most efficient way possible.

Ans : A

Explanation: Treemaps are best used when multiple categories are present, and the goal is to compare different parts of a whole.

Ans : D

Explanation: prop.test() is used to inference for 1 proportion using normal approx.

6. Ans : C

Explanation: factor.congruence is used to find the factor congruence coefficients.

7. Ans : A

Explanation: qqnorm is another tool for checking normality.

8. Ans : C

Explanation: Data visualization decrease the insights andtake solwer decisions is false statement.

9. Ans : D

Explanation: All option are Common use cases for data visualization.

10. Ans : C

Explanation: If the time series is random, such autocorrelations should be near zero for any and all timelag separations.

11. Ans : A

Explanation: Pros of data visualization : it can be accessed quickly by a wider audience.

12.

Ans : C

Explanation: It can be distracting : if the visual data is distorted or excessively used.

13. Ans : C

Explanation: Fever Maps is not is not used for data visualization instead of that Fever charts is used.

14. Ans : A

Explanation: Line charts. This is one of the most basic and common techniques used. Line charts display how variables can change over time.

15. Ans : B

Explanation: par() is used to query and edit graphical settings.

16 Ans : B

Explanation: data() load (often into a data.frame) built-in dataset.

17. Ans : A

Explanation: lm calls the lower level functions lm.fit.

18.

Ans : D

Explanation: names function is used to associate name with the value in the vector.

19.

Ans : D

Explanation: All option are correct.

20. Ans : C

Explanation: kde is used for density plots.

MCQ for UNIT 5

1. Point out the correct statement. a) Hadoop is an ideal environment for extracting and transforming small volumes of data b) Hadoop stores data in HDFS and supports data compression/decompression c) The Giraph framework is less useful than a MapReduce job to solve graph and machine learning d) None of the mentioned

2. Which of the following genres does Hadoop produce? a) Distributed file system b) JAX-RS c) Java Message Service d) Relational Database Management System

3. Which of the following platforms does Hadoop run on? a) Bare metal b) Debian c) Cross-platform d) Unix-like

4. Hadoop achieves reliability by replicating the data across multiple hosts and hence does not require ________ storage on hosts. a) RAID b) Standard RAID levels c) ZFS d) Operating system

5. The Hadoop list includes the HBase database, the Apache Mahout ________ system, and matrix operations. a) Machine learning b) Pattern recognition c) Statistical classification d) Artificial intelligence

6. As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including _______________ a) Improved data storage and information retrieval b) Improved extract, transform and load features for data integration c) Improved data warehousing functionality d) Improved security, workload management, and SQL support

7. Point out the correct statement. a) Hadoop do need specialized hardware to process the data b) Hadoop 2.0 allows live stream processing of real-time data c) In the Hadoop programming framework output files are divided into lines or records d) None of the mentioned

8. According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop? a) Big data management and data mining b) Data warehousing and business intelligence c) Management of Hadoop clusters d) Collecting and storing unstructured data

9. Hadoop is a framework that works with a variety of related tools. Common cohorts include ____________ a) MapReduce, Hive and HBase b) MapReduce, MySQL and Google Apps c) MapReduce, Hummer and Iguana d) MapReduce, Heron and Trumpet

10. Point out the wrong statement. a) Hardtop processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data b) Hadoop uses a programming model called “MapReduce”, all the programs should conform to this model in order to work on the Hadoop platform c) The programming model, MapReduce, used by Hadoop is difficult to write and test d) All of the mentioned

11. What was Hadoop named after? a) Creator Doug Cutting’s favorite circus act b) Cutting’s high school rock band c) The toy elephant of Cutting’s son d) A sound Cutting’s laptop made during Hadoop development

12. All of the following accurately describe Hadoop, EXCEPT ____________ a) Open-source b) Real-time c) Java-based d) Distributed computing approach

13. __________ can best be described as a programming model used to develop Hadoopbased applications that can process massive amounts of data. a) MapReduce b) Mahout

c) Oozie d) All of the mentioned

14. __________ has the world’s largest Hadoop cluster. a) Apple b) Datamatics c) Facebook d) None of the mentioned

15. Facebook Tackles Big Data With _______ based on Hadoop. a) ‘Project Prism’ b) ‘Prism’ c) ‘Project Big’ d) ‘Project Data’

16. ________ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets. a) Pig Latin b) Oozie c) Pig d) Hive

17. Point out the correct statement. a) Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data b) Hive is a relational database with SQL support c) Pig is a relational database with SQL support d) All of the mentioned

18. Hive also support custom extensions written in ____________ a) C# b) Java c) C d) C++

19. Point out the wrong statement. a) Elastic MapReduce (EMR) is Facebook’s packaged Hadoop offering b) Amazon Web Service Elastic MapReduce (EMR) is Amazon’s packaged Hadoop offering c) Scalding is a Scala API on top of Cascading that removes most Java boilerplate d) All of the mentioned

20. ___________ is general-purpose computing model and runtime system for distributed data analytics. a) Mapreduce b) Drill

c) Oozie d) None of the mentioned

21. The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to ____________ a) SQL b) JSON c) XML d) All of the mentioned

22. _______ jobs are optimized for scalability but not latency. a) Mapreduce b) Drill c) Oozie d) Hive

23. A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker. a) MapReduce b) Mapper c) TaskTracker d) JobTracker

24. Point out the correct statement. a) MapReduce tries to place the data and the compute as close as possible b) Map Task in MapReduce is performed using the Mapper() function c) Reduce Task in MapReduce is performed using the Map() function d) All of the mentioned

25. ___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results. a) Maptask b) Mapper c) Task execution d) All of the mentioned

26. _________ function is responsible for consolidating the results produced by each of the Map() functions/tasks. a) Reduce b) Map c) Reducer d) All of the mentioned

27. ________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer.

a) Hadoop Strdata b) Hadoop Streaming c) Hadoop Stream d) None of the mentioned

28. __________ maps input key/value pairs to a set of intermediate key/value pairs. a) Mapper b) Reducer c) Both Mapper and Reducer d) None of the mentioned

29. The number of maps is usually driven by the total size of ____________ a) inputs b) outputs c) tasks d) None of the mentioned

30. Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster. a) MapReduce b) Map c) Reducer d) All of the mentioned

31. A ________ serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication

32. Point out the correct statement. a) DataNode is the slave/worker node and holds the user data in the form of Data Blocks b) Each incoming file is broken into 32 MB by default c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance d) None of the mentioned

33. HDFS works in a __________ fashion. a) master-worker b) master-slave c) worker/slave d) all of the mentioned

34. Point out the wrong statement. a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode c) User data is stored on the local file system of DataNodes d) DataNode is aware of the files to which the blocks stored on it belong to

35. Which of the following scenario may not be a good fit for HDFS? a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file b) HDFS is suitable for storing data related to applications requiring low latency data access c) HDFS is suitable for storing data related to applications requiring low latency data access d) None of the mentioned

36. The need for data replication can arise in various scenarios like ____________ a) Replication Factor is changed b) DataNode goes down c) Data Blocks get corrupted d) All of the mentioned

37. ________ is the slave/worker node and holds the user data in the form of Data Blocks. a) DataNode b) NameNode c) Data block d) Replication

38. HDFS provides a command line interface called __________ used to interact with HDFS. a) “HDFS Shell” b) “FS Shell” c) “DFS Shell” d) None of the mentioned

39. HDFS is implemented in _____________ programming language. a) C++ b) Java c) Scala d) None of the mentioned

40. For YARN, the ___________ Manager UI provides host and port information. a) Data Node b) NameNode

c) Resource d) Replication

41. During start up, the ___________ loads the file system state from the fsimage and the edits log file. a) DataNode b) NameNode c) ActionNode d) None of the mentioned

42. Point out the correct statement. a) A Hadoop archive maps to a file system directory b) Hadoop archives are special format archives c) A Hadoop archive always has a *.har extension d) All of the mentioned

43. Using Hadoop Archives in __________ is as easy as specifying a different input filesystem than the default file system. a) Hive b) Pig c) MapReduce d) All of the mentioned

44. Pig operates in mainly how many nodes? a) Two b) Three c) Four d) Five

45. Point out the correct statement. a) You can run Pig in either mode using the “pig” command b) You can run Pig in batch mode using the Grunt shell c) You can run Pig in interactive mode using the FS shell d) None of the mentioned

46. You can run Pig in batch mode using __________ a) Pig shell command b) Pig scripts c) Pig options d) All of the mentioned

47. Pig Latin statements are generally organized in one of the following ways? a) A LOAD statement to read data from the file system b) A series of “transformation” statements to process the data

c) A DUMP statement to view results or a STORE statement to save the results d) All of the mentioned

48. Point out the wrong statement. a) To run Pig in local mode, you need access to a single machine b) The DISPLAY operator will display the results to your terminal screen c) To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation d) All of the mentioned

49. Which of the following function is used to read data in PIG? a) WRITE b) READ c) LOAD d) None of the mentioned

50. You can run Pig in interactive mode using the ______ shell. a) Grunt b) FS c) HDFS d) None of the mentioned

51. HBase is a distributed ________ database built on top of the Hadoop file system. a) Column-oriented b) Row-oriented c) Tuple-oriented d) None of the mentioned

52. Point out the correct statement. a) HDFS provides low latency access to single rows from billions of records (Random access) b) HBase sits on top of the Hadoop File System and provides read and write access c) HBase is a distributed file system suitable for storing large files d) None of the mentioned

53. HBase is ________ defines only column families. a) Row Oriented b) Schema-less c) Fixed Schema d) All of the mentioned

54. Apache HBase is a non-relational database modeled after Google’s _________ a) BigTop b) Bigtable

c) Scanner d) FoundationDB

55. Point out the wrong statement. a) HBase provides only sequential access to data b) HBase provides high latency batch processing c) HBase internally provides serialized access d) All of the mentioned

56. The _________ Server assigns regions to the region servers and takes the help of Apache ZooKeeper for this task. a) Region b) Master c) Zookeeper d) All of the mentioned

57. Which of the following command provides information about the user? a) status b) version c) whoami d) user

58. Which of the following command does not operate on tables? a) enabled b) disabled c) drop d) all of the mentioned

59. _________ command fetches the contents of a row or a cell. a) select b) get c) put d) none of the mentioned

60. HBaseAdmin and ____________ are the two important classes in this package that provide DDL functionalities. a) HTableDescriptor b) HDescriptor c) HTable d) HTabDescriptor

61. Which of the following is not a NoSQL database? a) SQL Server b) MongoDB

c) Cassandra d) None of the mentioned

62. Point out the correct statement. a) Documents can contain many different key-value pairs, or key-array pairs, or even nested documents b) MongoDB has official drivers for a variety of popular programming languages and development environments c) When compared to relational databases, NoSQL databases are more scalable and provide superior performance d) All of the mentioned

63. Which of the following is a NoSQL Database Type? a) SQL b) Document databases c) JSON d) All of the mentioned

64. Which of the following is a wide-column store? a) Cassandra b) Riak c) MongoDB d) Redis

65. Point out the wrong statement. a) Non Relational databases require that schemas be defined before you can add data b) NoSQL databases are built to allow the insertion of data without a predefined schema c) NewSQL databases are built to allow the insertion of data without a predefined schema d) All of the mentioned

66. Most NoSQL databases support automatic __________ meaning that you get high availability and disaster recovery. a) processing b) scalability c) replication d) all of the mentioned

67. Which of the following are the simplest NoSQL databases? a) Key-value b) Wide-column c) Document d) All of the mentioned

68. ________ stores are used to store information about networks, such as social connections. a) Key-value b) Wide-column c) Document d) Graph

69. NoSQL databases is used mainly for handling large volumes of ______________ data. a) unstructured b) structured c) semi-structured d) all of the mentioned

70. Point out the wrong statement? a) Key feature of R was that its syntax is very similar to S b) R runs only on Windows computing platform and operating system c) R has been reported to be running on modern tablets, phones, PDAs, and game consoles d) R functionality is divided into a number of Packages

71. R functionality is divided into a number of ________ a) Packages b) Functions c) Domains d) Classes

72. Which Package contains most fundamental functions to run R? a) root b) child c) base d) parent

73. Point out the wrong statement? a) One nice feature that R shares with many popular open source projects is frequent releases b) R has sophisticated graphics capabilities c) S’s base graphics system allows for very fine control over essentially every aspect of a plot or graph d) All of the mentioned

74. Which of the following is a base package for R language? a) util b) lang

c) tools d) spatial

75. Which of the following is “Recommended” package in R? a) util b) lang c) stats d) spatial

76. What is the output of getOption(“defaultPackages”) in R studio? a) Installs a new package b) Shows default packages in R c) Error d) Nothing will prin

77. Which of the following is used for Statistical analysis in R language? a) RStudio b) Studio c) Heck d) KStudio

78. In R language, a vector is defined that it can only contain objects of the ________ a) Same class b) Different class c) Similar class d) Any class

79. A list is represented as a vector but can contain objects of ___________ a) Same class b) Different class c) Similar class d) Any class

80. How can we define ‘undefined value’ in R language? a) Inf b) Sup c) Und d) NaN

81. What is NaN called? a) Not a Number b) Not a Numeric c) Number and Number d) Number a Numeric

82. How can we define ‘infinity’ in R language? a) Inf b) Sup c) Und d) NaN

83. Which one of the following is not a basic datatype? a) Numeric b) Character c) Data frame d) Integer

84. Matrices can be created by row-binding with the help of the following function. a) rjoin() b) rbind() c) rowbind() d) rbinding()

85. What is the function used to test objects (returns a logical operator) if they are NA? a) is.na() b) is.nan() c) as.na() d) as.nan()

86. What is the function used to test objects (returns a logical operator) if they are NaN? a) as.nan() b) is.na() c) as.na() d) is.nan()

87. What is the function to set column names for a matrix? a) names() b) colnames() c) col.names() d) column name cannot be set for a matrix

88. The most convenient way to use R is at a graphics workstation running a ________ system. a) windowing b) running c) interfacing d) matrix

89. Point out the wrong statement? a) Setting up a workstation to take full advantage of the customizable features of R is a

straightforward thing b) q() is used to quit the R program c) R has an inbuilt help facility similar to the man facility of UNIX d) Windows versions of R have other optional help systems also

90. Point out the wrong statement? a) Windows versions of R have other optional help system also b) The help.search command (alternatively ??) allows searching for help in various ways c) R is case insensitive as are most UNIX based packages, so A and a are different symbols and would refer to different variables d) $ R is used to start the R program

91. Elementary commands in R consist of either _______ or assignments. a) utilstats b) language c) expressions d) packages

92. How to install for a package and all of the other packages on which for depends? a) install.packages (for, depends = TRUE) b) R.install.packages (“for”, depends = TRUE) c) install.packages (“for”, depends = TRUE) d) install (“for”, depends = FALSE)

93. __________ function is used to watch for all available packages in library. a) lib() b) fun.lib() c) libr() d) library()

94. Attributes of an object (if any) can be accessed using the ______ function. a) objects() b) attrib() c) attributes() d) obj()

95. R objects can have attributes, which are like ________ for the object. a) metadata b) features c) expression d) dimensions

96. ________ generate random Normal variates with a given mean and standard deviation. a) dnorm b) rnorm

c) pnorm d) rpois

97. Point out the correct statement? a) R comes with a set of pseudo-random number generators b) Random number generators cannot be used to model random inputs c) Statistical procedure does not require random number generation d) For each probability distribution there are typically three functions

98. ______ evaluate the cumulative distribution function for a Normal distribution. a) dnorm b) rnorm c) pnorm d) rpois

99. _______ generate random Poisson variates with a given rate. a) dnorm b) rnorm c) pnorm d) rpois

100. Point out the wrong statement? a) For each probability distribution there are typically three functions b) For each probability distribution there are typically four functions c) r function is sufficient for simulating random numbers d) R comes with a set of pseudo-random number generators

101. _________ is the most common probability distribution to work with. a) Gaussian b) Parametric c) Paradox d) Simulation

102. Point out the correct statement? a) When simulating any random numbers it is not essential to set the random number seed b) It is not possible to generate random numbers from other probability distributions like the Poisson c) You should always set the random number seed when conducting a simulation d) Statistical procedure does not require random number generation

103. _______ function is used to simulate binary random variables. a) dnorm b) rbinom() c) binom() d) rpois

104. Point out the wrong statement? a) Drawing samples from specific probability distributions can be done with “s” functions b) The sample() function draws randomly from a specified set of (scalar) objects allowing you to sample from arbitrary distributions of numbers c) The sampling() function draws randomly from a specified set of objects d) You should always set the random number seed when conducting a simulation

105. _______ grammar makes a clear distinction between your data and what gets displayed on the screen or page. a) ggplot1 b) ggplot2 c) d3.js d) ggplot3

106. Point out the wrong statement? a) mean_se is used to calculate mean and standard errors on either side b) hmisc wraps up a selection of summary functions from Hmisc to make it easy to use c) plot is used to create a scatterplot matrix (experimental) d) translate_qplot_base is used for translating between qplot and base graphics

107. Which of the following cuts numeric vector into intervals of equal length? a) cut_interval b) cut_time c) cut_number d) cut_date

108. Which of the following is a plot to investigate the order in which observations were recorded? a) ggplot b) ggsave c) ggpcp d) ggorder

109. ________ is used for translating between qplot and base graphics. a) translate_qplot_base b) translate_qplot_gpl c) translate_qplot_lattice d) translate_qplot_ggplot

110. Which of the following is discrete state calculator? a) discrete_scale b) ggpcp c) ggfluctuation d) ggmissing

111. Which of the following creates fluctuation plot? a) ggmissplot b) ggmissing c) ggfluctuation d) ggpcp

112. __________ create a complete ggplot appropriate to a particular data type. a) autoplot b) is.ggplot c) printplot d) qplot_ggplot

113. Which of the following creates a new ggplot plot from a data frame? a) qplot_ggplot b) ggplot.data.frame c) ggfluctuation d) ggmissplot

Department of Information Technology

DATA ANALYTICS – KIT601 – Question Bank

UNIT-1

1. Data originally collected in the process of investigation are known as a) Foreign data b) Primary data c) Third data d) Secondary data e) None of these

2. Statistical enquiry means 1. It is science for knowledge 2. Search for knowledge 3. Collection of anything 4. Search for knowledge with the help of statistical methods. e) None of these

3. Cluster sampling means a) Sample is divided into number of sub-groups b) Sample are selected at regular interval c) Sample is obtained by conscious selection d) Universe is divided into groups e) None of these

4. What is Secondary data? a) Data collected in the process of investigation b) Data collected from some other agency c) Data collected from questionnaire of a person d) Both A & B e) None of these

5. What is information? a) Raw facts b) Processed data c) Understanding facts d) Knowing action on data e) None of these

6. Data about rocks is an example of a) Time dependent data b) Time Independent data c) Location dependent data d) Location independent data e) None of these

7. Range on temperature scale is termed as a) Nominal data b) Ordinal data

Department of Information Technology

c) Interval data d) Ratio data e) None of these

8. Data in XML and CSV format is an example of a) Structure data b) Un-structure data c) Semi-structure data d) Both A & B e) None of these

9. Which is not the characteristic of data a) Accuracy b) Consistency c) Granularity d) Redundant e) None of these

10. Hadoop is a framework that works with a variety of related tools. Common cohorts include: a) MapReduce, Hive and HBase b) MapReduce, MySQL and Google Apps c) MapReduce, Hummer and Iguana d) MapReduce, Heron and Trumpet e) None of these

11. Which is not the V in BIG data a) Volume b) Veracity c) Vigor d) Velocity e) None of these

12. Which is not true about Traditional decision making? a) Does not require human intervention b) Takes a long time to come to decision c) Lacks systematic linkage in planning d) Provides limited scope of data analytics e) None of these

13. Cloudera is a product of a) Microsoft b) Apache c) Google d) Facebook e) None of these

14. What is not true about MPP architecture a) Tightly coupled nodes b) High speed connection among nodes c) Disk are not shared

Department of Information Technology

d) Uses lot of processors e) None of these

15. The process of organizing and summarizing data in an easily readable format to communicate important information is known as a) Analysis b) Reporting c) Clustering d) Mining e) None of these

16. Out of the following which is not a type of report a) Canned b) Dashboard c) Ad hoc response d) Alerts e) None of these

17. Data Analysis is a process of? a) inspecting data b) cleaning data c) transforming data d) All of above e) None of these

18. Which of the following is not a major data analysis approaches? a) Data Mining b) Predictive Intelligence c) Business Intelligence d) Text Analytics e) None of these

19. How many main statistical methodologies are used in data analysis? a) 2 b) 3 c) 4 d) 5 e) None of these

20. Which of the following is true about regression analysis? a) answering yes/no questions about the data b) estimating numerical characteristics of the data c) modeling relationships within the data d) describing associations within the data e) None of these

21. __________ may be defined as the data objects that do not comply with the general behavior or model of the data available. a) Outlier Analysis b) Evolution Analysis

Department of Information Technology

c) Prediction d) Classification e) None of these

22. What is the use of data cleaning? a) to remove the noisy data b) correct the inconsistencies in data c) transformations to correct the wrong data. d) All of the above e) None of these

23. In data mining, this is a technique used to predict future behavior and anticipate the consequences of change. a) predictive technology b) disaster recovery c) phase change d) predictive modeling e) None of these

24. What are the main components of Big Data? a) MapReduce b) HDFS c) HBASE d) All of these e) None of these

25. ———- data that depends on data model and resides in a fixed field within a record. a) Structured data b) Un-Structured data c) Semi-Structured data d) Scattere e) None of these

26. —————- is about developing code to enable the machine to learn to perform tasks and its basic principle is the automatic modeling of underlying that have generated the collected data. a) Data Science b) Data Analytics c) Data Mining d) Data Warehousing e) None of these

27. —————– is an example of human generated unstructured data. a) YouTube data b) Satellite data c) Sensor data d) Seismic imagery data e) None of these

Department of Information Technology

28. Height is an example of which type of attribute a) Nominal b) Binary c) Ordinal d) Numeric e) None of these

29. ————-type of analytics describes what happened in past a) Descriptive b) Prescriptive c) Predictive d) Probability e) None of these

30. ————– data does not fits into a data model due to variations in contents a) Structured data b) Un - Structured data c) Semi Structured data d) Both B & C e) None of these

Department of Information Technology

UNIT-2

31. A and B are two events. If P(A, B) decreases while P(A) increases, which of the following is true? a) P(A|B) decreases b) P(B|A) decreases c) P(B) decreases d) All of above e) None of these

33. Suppose you are given an EM algorithm that finds maximum likelihood estimates for a model with latent variables. You are asked to modify the algorithm so that it finds MAP estimates instead. Which step or steps do you need to modify? a) Expectation b) Maximization c) No modification necessary d) Both A & B e) None of these

34. Compared to the variance of the Maximum Likelihood Estimate (MLE), the variance of the Maximum A Posteriori (MAP) estimate is ________ a) higher b) same c) lower d) it could be any of the above e) None of these

35. Bayesian methods are important to our study of machine learning is that they provide a useful perspective for understanding many learning algorithms that do not ............................ manipulate probabilities. a) explicitly b) implicitly c) both a & b d) approximately e) None of these

36. The results that we get after we apply Bayesian Theorem to a problem are, a) 100% accurate b) Estimated values c) Wrong values d) Only positive values e) None of these

Department of Information Technology

37. The previous probabilities in Bayes theorem that are changed with the help of new available information are classified as a) independent probabilities b) posterior probabilities c) interior probabilities d) dependent probabilities e) None of these

38. In contrast to the naive Bayes classifier, Bayesian belief networks allow stating conditional independence assumptions that apply to ............................... of the variables. a) subsets b) super sets c) empty set d) All of above e) None of these

39. The naive Bayes classifier applies to learning tasks where each instance x is described by a conjunction of attribute values and where the target function f ( x ) can take on ................. value from some................... set V. a) one, finite b) any, infinite c) one, infinite d) any, finite e) None of these

40. Bayes rule can be used to........................conditioned on one piece of evidence. a) solve queries b) increase complexity of a query c) decrease complexity of a query d) answer probabilistic queries e) None of these

41. Among which of the following mentioned statements can the Bayesian probability be applied? (i) In the cases, where we have one event (ii) In the cases, where we have two events (iii) In the cases, where we have three events (iv) In the cases, where we have more than three events

Options:

a) Only iv. b) All i., ii., iii. and iv. c) ii. and iv. d) Only ii. e) None of these

42. How the Bayesian network can be used to answer any query? a) Full distribution b) Joint distribution

Department of Information Technology

c) Partial distribution d) All of the mentioned above e) None of these

43. Which of the following methods do we use to find the best fit line for data in Linear Regression? a) Least Square Error b) Maximum Likelihood c) Logarithmic Loss d) Both A and B e) None of these

44. Linear Regression is a ..................... machine learning algorithm. a) supervised b) unsupervised c) reinforcement d) Both A & B e) None of these

45. Which of the following statement is true about outliers in Linear regression? a) Linear regression is not sensitive to outliers b) Linear regression is sensitive to outliers c) Can’t say d) There are no outliers e) None of these

46. Which of the following sentence is FALSE regarding regression? a) It relates inputs to outputs. b) It is used for prediction. c) It may be used for interpretation. d) It discovers causal relationships. e) None of these

47. Which of the following methods do we use to best fit the data in Logistic Regression? a) Least Square Error b) Maximum Likelihood c) Jaccard distance d) Both A & B e) None of these

48. Which of the following option is true? a) Linear Regression errors values has to be normally distributed but in case of Logistic Regression it is not the case b) Logistic Regression errors values has to be normally distributed but in case of Linear Regression it is not the case c) Both Linear Regression and Logistic Regression error values have to be normally distributed d) Both Linear Regression and Logistic Regression error values have not to be normally distributed e) None of these

Department of Information Technology

49. A decision tree is also known as a) general tree b) binary tree c) prediction tree d) fuzzy tree e) None of these

50. The confusion matrix is a useful tool for analyzing a) Regression b) Classification c) Sampling d) Cross Validation e) None of these

51. In regression the independent variable is also called as ———– a) Regressor b) Continuous c) Regressand d) Estimated e) None of these

52. ————— searches for the linear optimal separating hyperplane for separation of the data using essential training tuples called support vectors a) Decision tree b) Association Rule Mining c) Clustering d) Support vector machines e) None of these

53. Which of the following is used as attribute selection measure in decision tree algorithms? a) Information Gain b) Posterior probability c) Prior probability d) Support e) None of these

54. ———- is unsupervised technique aiming to divide a multivariate dataset into clusters or groups. a) KNN b) SVM c) Regression d) Cluster Analysis e) None of these

55. A perfect negative correlation is signified by ————- a) 1 b) -1 c) 0 d) 2

Department of Information Technology

e) None of these

56. ———— rule mining is a technique to identify underlying relations between different items. a) Classification b) Regression c) Clustering d) Association e) None of these

57. ———– is supervised machine learning algorithm outputs an optimal hyperplane for given labeled training data a) KNN b) SVM c) Regression d) Decision Tree e) None of these

58. Which of the following is measure used in decision trees while selecting splitting criteria that partitions data into the best possible manner. a) Probability b) Gini Index c) Regression d) Confusion matrix e) None of these

59. Which of the following is not a type of clustering algorithm? a) Density clustering b) K-Means clustering c) Centroid clustering d) Simple clustering e) None of these

60. —— answers the questions like ” How can we make it happen?” a) Descriptive b) Prescriptive c) Predictive d) Probability e) None of these

Department of Information Technology

UNIT-3

61. Some company wants to divide their customers into distinct groups to send offers this is an example of a) Data Extraction b) Data Classification c) Data Discrimination d) Data Selection e) None of these

62. When do we use Manhattan distance in data mining? a) Dimension of the data decreases b) Dimension of the data increases c) Under fitting d) Moderate size of the dimensions e) None of these

63. When there is no impact on one variable when increase or decrease on other variable then it is ———— a) Perfect correlation b) Positive correlation c) Negative correlation d) No correlation e) None of these

64. Apriori algorithm uses breadth first search and ————structure to count candidate item sets efficiently. a) Decision tree b) Hash Tree c) Red-Black Tree d) AVL Tree e) None of these

65. To determine basic salary of an employee when his qualification is given is a ———– problem a) Correlation b) Regression c) Association d) Qualitative e) None of these

66. ————the step is performed by data scientist after acquiring the data. a) Data Cleansing b) Data Integration c) Data Replication d) Data loading e) None of these

67. ———– is an indication of how often the rule has been found to be true in association rule mining. a) Confidence

Department of Information Technology

b) Support c) Lift d) Accuracy e) None of these

68. Which of the following statements about data streaming is true? a) Stream data is always unstructured data. b) Stream data often has a high velocity. c) Stream elements cannot be stored on disk. d) Stream data is always structured data. e) None of these

69. A Bloom filter guarantees no a) false positives b) false negatives c) false positives and false negatives d) false positives or false negatives, depending on the Bloom filter type e) None of these

70. The FM-sketch algorithm can be used to: a) Estimate the number of distinct elements. b) Sample data with a time-sensitive window. c) Estimate the frequent elements. d) Determine whether an element has already occurred in previous stream data. e) None of these

71 The DGIM algorithm was developed to estimate the counts of 1's occur within the last k bits of a stream window N. Which of the following statements is true about the estimate of the number of 0's based on DGIM? a) The number of 0's cannot be estimated at all. b) The number of 0's can be estimated with a maximum guaranteed error. c) To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's. d) Only 1’s can be estimated not 0’s e) None of these

72. What are DGIM’s maximum error boundaries? a) DGIM always underestimates the true count; at most by 25% b) DGIM either underestimates or overestimates the true count; at most by 50% c) DGIM always overestimates the count; at most by 50% d) DGIM either underestimates or overestimates the true count; at most by 25% e) None of these

73. Which algorithm should be used to approximate the number of distinct elements in a data stream? a) Misra-Gries b) Alon-Matias-Szegedy c) DGIM d) Apriori e) None of these

Department of Information Technology

74. Which of the following statements about standard Bloom filters is correct? a) It is possible to delete an element from a Bloom filter. b) A Bloom filter always returns the correct result. c) It is possible to alter the hash functions of a full Bloom filter to create more space. d) A Bloom filter always returns TRUE when testing for a previously added element. e) None of these

75. ETL stands for ________________ a) Extraction transformation and loading b) Extract Taken Lend c) Enterprise Transfer Load d) Entertainment Transference Load e) None of these

76. Which of the following is not a major data analysis approaches? a) Data Mining b) Predictive Intelligence c).Business Intelligence d) Text Analytics e) None of these

77. What do you mean by Real Time ANALYTICS platform. a) Manages and process data and helps timely decision making b helps to develop dynamic analysis application c) leads to evolution of non business intelligence d) hadoop e)None of these

78. Data Analysis is defined by the statistician? a) William S. b)Hans Peter Luhn c) Gregory Piatetsky-Shapiro d) John Tukey e)None of these

79 Which of the following is a wrong statement. a). The big volume actually represents Big Data b). Big Data is just about tons of data c). The data growth and social media explosion have improved that how we look at the data d). All of these e). None of these

80 Which of the following emphases on the discovery of earlier properties that are not known on the data? a) Machine Learning b). Big Data c). Data wrangling d). Data mining e)None of these

Department of Information Technology

81 What are DGIM’s maximum error boundaries? a)DGIM always underestimates the true count; at most by 25% b)DGIM either underestimates or overestimates the true count; at most by 50% c)DGIM always overestimates the count; at most by 50% d)DGIM either underestimates or overestimates the true count; at most by 25% e)None of these

82 A Bloom filter guarantees no a)false positives b)false negatives c)false positives and false negatives d)false positives or false negatives, depending on the Bloom filter e)None of these

83. Which of the following statements about the standard DGIM algorithm are false? a) DGIM operates on a time-based window. b) In DGIM, the size of a bucket is always a power of two. c) The maximum number of buckets has to be chosen beforehand. d) The buckets contain the count of 1's and each 1's specific position in the stream. e)None of these

84 What are two differences between large-scale computing and big data processing? a)hardware b) Data is more suitable for finding new patterns in data than Large Scale Computing c) amount of processing time available d) amount of data processed e)None of these

85 In Flajolet-Martin algorithm if the stream contains n elements with m of them unique, this algorithm runs in a) O(n) time b) constant time c) O(2n) time d) O(3n)time e)None of these

86 What are two differences between large-scale computing and big data processing? a) hardware b) Data is more suitable for finding new patterns in data than Large Scale Computing c) amount of processing time available d) number of passes made over the data e)None of these

87 what does it mean when an algorithm is said to 'scale well'? a) The running time does not increase exponentially when data becomes longer. b)The result quality goes up when the data becomes larger. c) The memory usage does not increase exponentially when data becomes larger. d) The result quality remains the same when the data becomes larger. e)None of these

Department of Information Technology

89The FM-sketch algorithm can be used to: a)Estimate the number of distinct elements. b)Sample data with a time-sensitive window. c)Estimate the frequent elements. d)Determine whether an element has already occurred in previous stream data. e)None of these

90Which attribute is _not_ indicative for data streaming? a)Limited amount of memory b)Limited amount of processing time c)Limited amount of input data d)Limited amount of processing power e)None of these

UNIT 4

91 Which of the following clustering type has characteristic shown in the below figure?

a) Exploratory b) Inferential c) Causal d) Hierarchical Clustering e)None of these

92 Which of the following dimension type graph is shown in the below figure?

a) one-dimensional b) two-dimensional c) three-dimensional d) four-dimensional e)None of these

93 Which of the following gave rise to need of graphs in data analysis? a)Data visualization b) Communicating results

Department of Information Technology

c) Decision making d) All of the mentioned e)None of these

94Which of the following is characteristic of exploratory graph? a) Made slowly b) Axes are not cleaned up c) Color is used for personal information d) All of the mentioned e)None of these

95Color and shape are used to add dimensions to graph data. a)True b) False c)Dilemma d)Incorrect Statement e)None of these

96.Which of the following information is not given by five-number summary? a) Mean b) Median c) Mode d) All of the mentioned e)None of these

97.Which of the following is also referred to as overlayed 1D plot? a)lattice b) barplot c) gplot d) all of the mentioned e)None of these

98.Spinning plots can be used for two dimensional data. a)True b) False c)Incorrect d)Not Sure e)None of these

99 Point out the correct statement. a) coplots are one dimensional data graph b) Exploratory graphs are made quickly c) Exploratory graphs are made relatively less in number d) All of the mentioned e)None of these

100 Which of the following clustering technique is used by K- Means Algorithm a)HierarchicalTechnique b)Partitional technique c)Divisive

Department of Information Technology

d) Agglomerative e)None of these

101.SON algorithm is also known as a)PCY Algorithm b MultistageAlgorithm c)Multihash Algorithm d)Partition Algorithm D e)None of these

102. Which technique is used to filter unnecessary itemset in PCY algorithm a )Association Rule b)Hashing Technique c)Data Mining d)Market basket B e)None of these

103 In association rule, which of the following indicates the measure of how frequently the items occur in a dataset ? a)Support b)Confidence c)Basket d)Itemset e)None of these

104.Which term indicated the degree of corelation in dataset between X and Y, if the given association rule given is X-->Y a)Confidence b)Monotonicity c)Distinct d)Hashing e)None of these

105.During start up, the ___________ loads the file system state from the fsimage and the edits log file. a) DataNode b) NameNode c) ActionNode d) Data Action Node e)None of these

106 Which of the following scenario may not be a good fit for HDFS? a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to thesame file b) HDFS is suitable for storing data related to applications requiring low latency data access c) HDFS is suitable for storing data related to applications requiring low latency data access d) HDFS is suitable for scenarios requiring multiple/simultaneous writes to thesame file e)None of these

Department of Information Technology

107________ is the slave/worker node and holds the user data in the form of Data Blocks. a) DataNode b) NameNode c) Data block d) Replication e)None of these

108.HDFS provides a command line interface called __________ used to interact with HDFS. a) “HDFS Shell” b) “FS Shell” c) “DFS Shell” d) None of the mentioned e)None of these 109 What is CLIQUE ? a)CLIQUE is a grid based method for finding density based clusters in subspaces. b)CLIQUE is a click method c)used to prune non- promising cells and to improve efficiency d)used to measure distance e)None of these

110 CLIQUE stands for ? a) Clustering in QUEst b) Common in Quest c)Calculate in Quest d)Click in Quest e)None of these

111What is approaches for high dimensional data clustering a)Subspace clustering b)Projected clustering and Biclustering. c) Data Clustering d)Space Clustering e)None of these

112Applications of frequent itemset analysis a) Related concepts ,Plagiarism , Biomarkers b)CLUSTERING c)Design d)Operation e)None of these

113. k-means is a ………..based algorithm or distance based algorithm where we calculate the distances to assign a point to a cluster a) centroid b)Distance c)Neuron d)Dendron e) None of these

Department of Information Technology

114--------is an algorithm for frequent item set mining and association rule learning over relational databases a)Confidence b) Apriori c)Disadvantage d)Market basket e) None of these

115 The HBase database includes the Hadoop list, the Apache Mahout ________ system, and matrix operations. A. Statistical classification B. Pattern recognition C. Machine learning D. Artificial intelligence E. All of these

116 To discover interesting relations between objects in larger databases is a objective of ---- a) Freqent Set Mining b)Market basket Mining c) association rules mining d) Confidence Gain e) None of these

117 Different methods for storing itemset count in main memory. a)The triangular matrix method b)The triples method c)Angular method d)Square Method e) None of these

118 ------used to prune non- promising cells and to improve efficiency. a)market basket b)frequent itemset c)Support d) aprioriproperty e) None of these

119 dentify the algorithm in which, on the first pass we count the item themselves and then determine which items are frequent. On the second pass we count only the pairs of item both of which are found frequent on first pass a)DGIM b)CURE c)Pagerank d)Apriori e)None of these 120 A resource used for sharing data globally by all nodes is a)Distributed b) Cache Centralised Cache c)secondry memory d)primarymemory e) None of these

Department of Information Technology

Unit 5 121.Input to the is the sorted output of the mappers. a) Reducer b) Mapper c) Shuffle d) All of the above e)None of these

122. Which of the following statements about data streaming is true? a)Stream data is always unstructured data. b)Stream data often has a high velocity. c)Stream elements cannot be stored on disk. d)Stream data is always structured data. e)None of these

123 The output of the is not sorted in the Mapreduce framework for Hadoop. (A) Mapper (B) Cascader (C) Scalding (D) None of the above e) None of these

124: Which of the following phases occur simultaneously? (A) Reduce and Sort (B) Shuffle and Sort (C) Shuffle and Map d)sort and ruduce e) None of these

125.A ________ serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication e) None of these

126.HDFS works in a __________ fashion. a) master-worker b) master-slave c) worker/slave d) all of the mentioned e) None of these

127.________ NameNode is used when the Primary NameNode goes down. a) Rack b) Data c) Secondary d) None of the mentioned e) None of these

Department of Information Technology

128 Point out the wrong statement. a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode c) User data is stored on the local file system of DataNodes d) DataNode is aware of the files to which the blocks stored on it belong to e) None of these

129 The need for data replication can arise in various scenarios like ____________ a) Replication Factor is changed b) DataNode goes down c) Data Blocks get corrupted d) All of the mentioned e) None of these

130.For YARN, the ___________ Manager UI provides host and port information. a) Data Node b) NameNode c) Resource d) Replication e) None of these 131 HDFS works in a __________ fashion. a)worker-master fashion b)master-slave fashion c)master-worker fashion d)slave-master e)None of these

132HDFS is implemented in _____________ language. a) C b)Perl c)Python d)Java e)none of these

133 The default block size in hadoop is ______. a)16MB b) 32MB c)64MB d)128MB e) none of these

134. ____ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data. a)MapReduce b)Mahout c)Oozie d)Hbase e)None of these

Department of Information Technology

135 Mapper and Reducer implementations can use the to report progress or just indicate that they are alive. (A) Partitioner (B) OutputCollector (C) Reporter (D) All of the above e) None of these

136 is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer. (A) Partitioner (B) OutputCollector (C) Reporter (D) All of the above e) None of these

137 A serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication e) None of these

138 HDFS works in a fashion. (A)a)masterworker b) master-slave c) worker/slave d) All of the above e) None of these

139 NameNode is used when the Primary NameNode goes down. a) Rack b) Data c) Secondary d) None e)none of these

140 HDFS is implemented in programming language. ( a) C++ b) Java c) Scala d) None e) None of these

141 Hadoop developed by _______________ a)Larry Page b)Doug Cutting c)Mark d)Bill Gates e) None of these

Department of Information Technology

142.The MapReduce algorithm contains two important tasks, namely __________. a)mapped, reduce b)mapping, Reduction c) Map, Reduction d) Map, Reduce e)None of these

143.mapper and reducer classes extends classes from the package a) org.apache.hadd op.mapreduce b)apache.hadoop c)org.mapreduce d)hadoop.mapreduce e) None of these

144.HDFS inherited from ------------- file system. a)Yahoo b) FTFS c)Google d)Rediff e) none of these

145 NameNode is used when the Primary NameNode goes down. a) Rack b) Data c) Secondary d) Primary e) None of these

146 HDFS works in a fashion. a) master-worker b) master-slave c) worker/slave d) All of the above e) None of these

147: A serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication e) None of these

148 HDFS provides a command line interface called used to interact with HDFS. a) HDFS Shell b) FS Shell c) DFSA Shell d) NO shell e) None of these

Department of Information Technology

149 is the slave/worker node and holds the user data in the form of Data Blocks. a) DataNode b) NameNode c) Data lock d) Replication e) None of these

150. is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution. a) Map Parameters b) JobConf c) MemoryConf d) All of the above e) None of these

Department of Information Technology

Data Analytics KIT-601 Answer key UNIT-1 UNIT-2 UNIT-3 UNIT-4 UNIT-5 1-b 31-b 61-b 91-d 121-a 2-d 32-b 62-b 92-b 122-b 3-d 33-b 63-d 93-d 123-d 4-b 34-c 64-b 94-c 124-a 5-b 35-a 65-d 95-a 125-b 6-c 36-b 66-a 96-c 126-a 7-c 37-b 67-a 97-a 127-c 8-c 38-a 68-b 98-a 128-a 9-d 39-d 69-b 99-a 129-d 10-a 40-d 70-a 100-b 130-c 11-c 41-d 71-b 101-d 131-b 12-a 42-b 72-b 102-b 132-d 13-b 43-a 73-e 103-a 133-c 14-a 44-a 74-d 104-a 134-a 15-b 45-b 75-a 105-b 135-c 16-c 46-d 76-b 106-a,d 136-b 17-d 47-b 77-a,b 107-a 137-b 18-b 48-a 78-d 108-b 138-a 19-a 49-c 79-b 109-a 139-c 20-c 50-b 80-d 110-a 140-b 21-a 51-a 81-b 111-a,b 141-b 22-d 52-d 82-b 112-a 142-d 23-d 53-a 83-c,d 113-a 143-a 24-d 54-d 84-a,b 114-b 144-c 25-a 55-c 85-a 115-c,d 145-c 26-b 56-d 86-b 116-c 146-b 27-a 57-b 87-a,b 117-a,b 147-b 28-d 58-b 88-c 118-b 148-b 29-a 59-d 89-a,d 119-d 149-a 30-b 60-b 90-c 120-a 150-b

***************Data Analytics MCQs Set - 1***************

1. The branch of statistics which deals with development of particular statistical methods

is classified as

1. industry statistics

2. economic statistics

3. applied statistics

4. applied statistics

Answer: applied statistics

2. Which of the following is true about regression analysis?

1. answering yes/no questions about the data

2. estimating numerical characteristics of the data

3. modeling relationships within the data

4. describing associations within the data

Answer: modeling relationships within the data

3. Text Analytics, also referred to as Text Mining?

1. True Join:- https://t.me/AKTU_Notes_Books_Quantum

2. False

3. Can be true or False

4. Can not say

Answer: True

4. What is a hypothesis?

1. A statement that the researcher wants to test through the data collected in a study.

2. A research question the results will answer.

3. A theory that underpins the study.

4. A statistical method for calculating the extent to which the results could have happened by

chance.

Answer: A statement that the researcher wants to test through the data collected in a study.

5. What is the cyclical process of collecting and analysing data during a single research

study called?

1. Interim Analysis

2. Inter analysis

3. inter item analysis

4. constant analysis

Answer: Interim Analysis

6. The process of quantifying data is referred to as ____ Join:- https://t.me/AKTU_Notes_Books_Quantum

1. Topology

2. Digramming

3. Enumeration

4. coding

Answer: Enumeration

7. An advantage of using computer programs for qualitative data is that they _

1. Can reduce time required to analyse data (i.e., after the data are transcribed)

2. Help in storing and organising data

3. Make many procedures available that are rarely done by hand due to time constraints

4. All of the above

Answer: All of the Above

8. Boolean operators are words that are used to create logical combinations.

1. True

2. False

Answer: True

9. ______ are the basic building blocks of qualitative data.

1. Categories Join:- https://t.me/AKTU_Notes_Books_Quantum

2. Units

3. Individuals

4. None of the above

Answer: Categories

10. This is the process of transforming qualitative research data from written interviews or field notes into typed text.

1. Segmenting

2. Coding

3. Transcription

4. Mnemoning

Answer: Transcription

11. A challenge of qualitative data analysis is that it often includes data that are unwieldy and complex; it is a major challenge to make sense of the large pool of data.

1. True

2. False

Answer: True

12. Hypothesis testing and estimation are both types of descriptive statistics.

1. True

2. False Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: False

13. A set of data organised in a participants(rows)-by-variables(columns) format is known as a “data set.”

1. True

2. False

Answer: True

14. A graph that uses vertical bars to represent data is called a ___

1. Line graph

2. Bar graph

3. Scatterplot

4. Vertical graph

Answer: Bar graph

15. ____ are used when you want to visually examine the relationship between two

quantitative variables.

1. Bar graph

2. pie graph

3. line graph

4. Scatterplot Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: Scatterplot

16. The denominator (bottom) of the z-score formula is

1. The standard deviation

2. The difference between a score and the mean

3. The range

4. The mean

Answer: The standard deviation

17. Which of these distributions is used for a testing hypothesis?

1. Normal Distribution

2. Chi-Squared Distribution

3. Gamma Distribution

4. Poisson Distribution

Answer: Chi-Squared Distribution

18. A statement made about a population for testing purpose is called?

1. Statistic

2. Hypothesis

3. Level of Significance

4. Test-Statistic Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: Hypothesis

19. If the assumed hypothesis is tested for rejection considering it to be true is called?

1. Null Hypothesis

2. Statistical Hypothesis

3. Simple Hypothesis

4. Composite Hypothesis

Answer: Null Hypothesis

20. If the null hypothesis is false then which of the following is accepted?

1. Null Hypothesis

2. Positive Hypothesis

3. Negative Hypothesis

4. Alternative Hypothesis.

Answer: Alternative Hypothesis.

21. Alternative Hypothesis is also called as?

1. Composite hypothesis

2. Research Hypothesis

3. Simple Hypothesis

4. Null Hypothesis Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: Research Hypothesis

*************** Data Analytics MCQs Set – 2 ***************

1. What is the minimum no. of variables/ features required to perform clustering?

1.0

2.1

3.2

4.3

Answer: 1

2. For two runs of K-Mean clustering is it expected to get same clustering results?

1. Yes

2. No

Answer: No

3. Which of the following algorithm is most sensitive to outliers? Join:- https://t.me/AKTU_Notes_Books_Quantum

1. K-means clustering algorithm

2. K-medians clustering algorithm

3. K-modes clustering algorithm

4. K-medoids clustering algorithm

Answer: K-means clustering algorithm

4. The discrete variables and continuous variables are two types of

1. Open end classification

2. Time series classification

3. Qualitative classification

4. Quantitative classification

Answer: Quantitative classification

5. Bayesian classifiers is

1. A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.

2. Any mechanism employed by a learning system to constrain the search space of a hypothesis

3. An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation.

4. None of these

Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.

6. Classification accuracy is

1. A subdivision of a set of examples into a number of classes

2. Measure of the accuracy, of the classification of a concept that is given by a certain theory

3. The task of assigning a classification to a set of examples

4. None of these

Answer: Measure of the accuracy, of the classification of a concept that is given by a certain theory

7. Euclidean distance measure is

1. A stage of the KDD process in which new data is added to the existing selection.

2. The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them

3. The distance between two points as calculated using the Pythagoras theorem

4. none of above

Answer: The distance between two points as calculated using the Pythagoras theorem

8. Hybrid is

1. Combining different types of method or information Join:- https://t.me/AKTU_Notes_Books_Quantum

2. Approach to the design of learning algorithms that is structured along the lines of the theory of evolution.

3. Decision support systems that contain an information base filled with the knowledge of an expert formulated in terms of if-then rules.

4. none of above

Answer: Combining different types of method or information

9. Decision trees use , in that they always choose the option that seems the best available at that moment.

1. Greedy Algorithms

2. divide and conquer

3. Backtracking

4. Shortest path algorithm

Answer: Greedy Algorithms

10. Discovery is

1. It is hidden within a database and can only be recovered if one is given certain clues (an example IS encrypted information).

2. The process of executing implicit previously unknown and potentially useful information from data

3. An extremely complex molecule that occurs in human chromosomes and that carries genetic

information in the form of genes.

4. None of these

Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: The process of executing implicit previously unknown and potentially useful information from data

11. Hidden knowledge referred to

1. A set of databases from different vendors, possibly using different database paradigms

2. An approach to a problem that is not guaranteed to work but performs well in most cases

3. Information that is hidden in a database and that cannot be recovered by a simple SQL query.

4. None of these

Answer: Information that is hidden in a database and that cannot be recovered by a simple SQL query.

12. Decision trees cannot handle categorical attributes with many distinct values, such as country codes for telephone numbers.

1. True

2. False

Answer: False

15. CNMICHMENT IS

1. A stage of the KDD process in which new data is added to the existing selection

2. The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them

3. The distance between two points as calculated using the Pythagoras theorem.

4. None of these Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: A stage of the KDD process in which new data is added to the existing selection

14. are easy to implement and can execute efficiently even without prior knowledge of the data, they are among the most popular algorithms for classifying text documents.

1. 1D3

2. Naive Bayes classifiers

3. CART

4. None of above

Answer: Naive Bayes classifiers

15. High entropy means that the partitions in classification are

1. Pure

2. Not Pure

3. Usefull

4. useless

Answer: Uses a single processor or computer

16. Which of the following statements about Naive Bayes is incorrect?

1. Attributes are equally important.

2. Attributes are statistically dependent of one another given the class value.

3. Attributes are statistically independent of one another given the class value. Join:- https://t.me/AKTU_Notes_Books_Quantum

4. Attributes can be nominal or numeric

Answer: Attributes are statistically dependent of one another given the class value.

17. The maximum value for entropy depends on the number of classes so if we have 8 Classes what will be the max entropy.

1. Max Entropy is 1

2. Max Entropy is 2

3. Max Entropy is 3

4. Max Entropy is 4

Answer: Max Entropy is 3

18. Point out the wrong statement.

1. k-nearest neighbor is same as k-means

2. k-means clustering is a method of vector quantization

3. k-means clustering aims to partition n observations into k clusters

4. none of the mentioned

Answer: k-nearest neighbor is same as k-means

19. Consider the following example “How we can divide set of articles such that those articles have the same theme (we do not know the theme of the articles ahead of time) ” is this:

1. Clustering

2. Classification Join:- https://t.me/AKTU_Notes_Books_Quantum

3. Regression

4. None of these

Answer: Clustering

20. Can we use K Mean Clustering to identify the objects in video?

1. Yes

2. No

Answer: Yes

21. Clustering techniques are in the sense that the data scientist does not determine, in advance, the labels to apply to the clusters.

1. Unsupervised

2. supervised

3. Reinforcement

4, Neural network

Answer: Unsupervised

22. metric is examined to determine a reasonably optimal value of k.

1. Mean Square Error

2. Within Sum of Squares (WSS)

3. Speed Join:- https://t.me/AKTU_Notes_Books_Quantum

4. None of these

Answer: Within Sum of Squares (WSS)

23. If an itemset is considered frequent, then any subset of the frequent itemset must also be frequent.

1. Apriori Property

2. Downward Closure Property

3. Either 1 or 2

4. Both 1 and 2

Answer: Both 1 and 2Z

24. if {bread,eggs,milk} has a support of 0.15 and {bread,eggs} also has a support of 0.15, the confidence of rule {bread,eggs} = {milk} is

1.0

2.1

3.2

4.3

Answer: 1

25. Confidence is a measure of how X and Y are really related rather than coincidentally happeningtogether.

1. True Join:- https://t.me/AKTU_Notes_Books_Quantum

2. False

Answer: False

26. recommend items based on similarity measures between users and/or items.

1. Content Based Systems

2. Hybrid System

3. Collaborative Filtering Systems

4. None of these

Answer: Collaborative Filtering Systems

27. There are major Classification of Collaborative Filtering Mechanisms

1.1

2.2

3.3

4. none of above

Answer: 2

28. Movie Recommendation to people is an example of

1. User Based Recommendation

2. Item Based Recommendation

3. Knowledge Based Recommendation Join:- https://t.me/AKTU_Notes_Books_Quantum

4. content based recommendation

Answer: Item Based Recommendation

29. recommenders rely on an explicitely defined set of recommendation rules

1. Constraint Based

2. Case Based

3. Content Based

4. User Based

Answer: Case Based

30. Parallelized hybrid recommender systems operate dependently of one another and produce separate recommendation lists.

1. True

2. False

Answer: False

COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 1 DataAnalytics(KIT601)

1. The data with no pre-defined organizational form or specific format is

a. Semi-structured data b. Unstructured data c. Structured data d. None of these

Ans. b

a. Categorical data b. Interval data c. Ordinal data d. Ratio data

Ans. c

2. The data which can be ordered or ranked according to some relationship to one another is

3. Predict the future by examining historical data, detecting patterns or relationships in these data, and then extrapolating these relationships forward in time. a. Prescriptive model b. Descriptive model c. Predictive model d. None of these

Ans. b

Ans. a

Ans. c

Ans. d

4. Person responsible for the genesis of the project, providing the impetus for the project and core business problem, generally provides the funding and will gauge the degree of value from the final outputs of the working team is a. Business User b. Project Sponsor c. Business Intelligence Analyst d. Data Engineer

5. Deep technical skills to assist with tuning SQL queries for data management, extraction and support data ingest to analytic sandbox is handled by ___________. a. Data Engineer b. Business User c. Project Sponsor d. Business Intelligence Analyst

6. Business domain expertise with deep understanding of the data, KPIs, key metrics and business intelligence from a reporting perspective is key role of ____________.

a. Business User b. Project Sponsor c. Business Intelligence Analyst d. Data Engineer

7. _____________ is concerned with uncertainty or inaccuracy of the data.

a. Volume b. Velocity c. Variety d. Veracity

Ans. d

Ans. True

11. The process of exploring data and reports in order to extract meaningful insights, which can be used to better understand and improve business performance.

a. Reporting b. Analysis c. Summarizing d. None of these

Ans. b

Ans. a

8. What are the V’s in the characteristics of Big data? a. Volume b. Velocity c. Variety d. All of these

9. What are the types of reporting in data analytics?

a. Canned reports b. Dashboard reports c. Alert reports d. All of above

10.Massive Parallel Processing (MPP) database breaks the data into independent chunks with independent disk and CPU resources.

a. True b. False

12. The key components of an analytical sandbox are: (i) Business analytics (ii) Analytical sandbox platform (iii) Data access and delivery (iv) Data sources

a. True b. False

Ans. b

14. Which phase Prepare an analytic sandbox, in which you can work for the duration of the project. Perform ELT and ETL to get data into the sandbox, and begin transforming the data so you can work with it and analyze it. Familiarize yourself with the data thoroughly and take steps to condition the data.

a. Data preparation b. Discovery c. Data Modelling d. Data Building Ans. a

Ans.b

Ans. a

13. The ____________phase learn the business domain, including relevant history, such as whether the organization or business unit has attempted similar projects in the past, from which you can learn. Assess the resources you will have to support the project, in terms of people, technology, time, and data. Frame the business problem as an analytic challenge that can be addressed in subsequent phases. Formulate initial hypotheses (IH) to test and begin learning the data. a. Data preparation b. Discovery c. Data Modelling d. Data Building

15. Which phase uses SQL, Python, R, or excel to perform various data modifications and transformations.

a. Data preparation b. Data cleaning c. Data Modelling d. Data Building

16. By definition, Database Administrator is a person who ___________

a. Provisions and configures database environment to support the analytical needs of the working team. b. Ensure key milestones and objectives are met on time and at expected quality. c. Deep technical skills to assist with tuning SQL queries for data management, extraction and support data ingest to analytic sandbox. d. None of these

Ans. a

Ans. c

Ans. b

Ans .b

17. ETL stands for

a. Extract, Load, Transform b. Evaluate, Transform ,Load c. Extract , Loss , Transform d. None of the above

18. The phase Develop data sets for testing, training, and production purposes. Get the best environment you can for executing models and workflows, including fast hardware and parallel processing is referred to as

a. Data preparation b. Discovery c. Data Modelling d. Data Building

19. Which of the following is not a major data analysis approaches?

a. Data Mining b. Predictive Intelligence c. Business Intelligence d. Text Analytics

20. User rating given to a movie in a scale 1-10, can be considered as an attribute of type?

a. Nominal b. Ordinal c. Interval d. Ratio

Ans. d

22. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

a. TRUE b. FALSE c. Can be true or false d. Cannot say

Ans. a

Ans. b

Ans.b

25. The Process of describing the data that is huge and complex to store and process is known as

a. Analytics b. Data mining c. Big Data d. Data Warehouse

21. Data Analysis is defined by the statistician?

a. William S. b. Hans Peter Luhn c. Gregory Piatetsky-Shapiro d. John Tukey

23. Which of the following is not a major data analysis approaches?

a. Data Mining b. Predictive Intelligence c. Business Intelligence d. Text Analytics

24. Which of the following step is performed by data scientist after acquiring the data?

a. Data Cleansing b. Data Integration c. Data Replication d. All of the mentioned

Ans. c

26. Data generated from online transactions is one of the example for volume of big data. Is this true or False. a. TRUE b. FALSE

Ans. a

27. Velocity is the speed at which the data is processed

a. TRUE b. FALSE

Ans. b

28. _____________ have a structure but cannot be stored in a database.

a. Structured b. Semi-Structured c. Unstructured d. None of these

Ans. b

29. ____________refers to the ability to turn your data useful for business.

a. Velocity b. Variety c. Value d. Volume

Ans. c

30. Value tells the trustworthiness of data in terms of quality and accuracy.

a. TRUE b. FALSE

Ans.b

NPTEL Questions

31. Analysing the data to answer why some phenomenon related to learning happened is a type of

a. Descriptive Analytics b. Diagnostic Analytics

c. Predictive Analytics d. Prescriptive Analytics

Ans. B

32. Analysing the data to answer what will happen next is a type of

a. Descriptive Analytics b. Diagnostic Analytics c. Predictive Analytics d. Prescriptive Analytics

Ans. D

33. Learning analytics at institutions/University, regional or national level is termed as

a. Educational data mining b. Business intelligence c. Academic analytics d. None of the above

Ans. C

34. Which of the following questions is not a type of Predictive Analytics?

a. What is the average score of all students in the CBSE 10th Maths Exam? b. What will be the performance of a students in next questions? c. Which courses will the student take in the next semester? d. What is the average attendance of the class over the semester

Ans A,D

35. A courses instructor has data about students attendance in her course in the past semester. Based on this data, she constructs a line graph type of analytics is she doing?

a. Descriptive Analytics b. Diagnostic Analytics c. Predictive Analytics d. Prescriptive Analytics

Ans. A

36. she then correlates the attendance with their final exam scores. She realizes that students who score 90% and above also have an attandence of more then 75%. What type of analytics is she doing?

a. Descriptive Analytics b. Diagnostic Analytics c. Predictive Analytics d. Prescriptive Analytics

Ans. B

38. Why one should not go for sampling?

a. Less costly to administer than a census. b. The person authorizing the study is comfortable with the sample. c. Because the research process is sometimes destructive d. None of the above

Ans. d

39. Stratified random sampling is a method of selecting a sample in which:

a. the sample is first divided into strata, and then random samples are taken from each stratum b. various strata are selected from the sample c. the population is first divided into strata, and then random samples are drawn from each stratum d. None of these alternatives is correct.

Ans. c

SET II

1. Data Analysis is defined by the statistician?

e. William S. f. Hans Peter Luhn g. Gregory Piatetsky-Shapiro h. John Tukey

Ans D

2. What is classification?

a) deciding what features to use in a pattern recognition problem b) deciding what class an input pattern belongs to c) deciding what type of neural network to use d) none of the mentioned

Ans. B

3. Data in ___________ bytes size is called Big Data.

A. Tera B. Giga C. Peta D. Meta

Ans : C

Explanation: data in Peta bytes i.e. 10^15 byte size is called Big Data. 4. How many V's of Big Data

A. 2 B. 3 C. 4 D. 5

Ans : D

Explanation: Big Data was defined by the “3Vs” but now there are “5Vs” of Big Data which are Volume, Velocity, Variety, Veracity, Value

5. Transaction data of the bank is?

A. structured data B. unstructured datat C. Both A and B D. None of the above

Ans : A

Explanation: Data which can be saved in tables are structured data like the transaction data of the bank. 6. In how many forms BigData could be found?

A. 2 B. 3 C. 4 D. 5

Ans : B

Explanation: BigData could be found in three forms: Structured, Unstructured and Semistructured. 7. Which of the following are Benefits of Big Data Processing?

A. Businesses can utilize outside intelligence while taking decisions B. Improved customer service C. Better operational efficiency D. All of the above

Ans : D

Explanation: All of the above are Benefits of Big Data Processing.

8. Which of the following are incorrect Big Data Technologies?

A. Apache Hadoop B. Apache Spark C. Apache Kafka D. Apache Pytarch

Ans : D

Explanation: Apache Pytarch is incorrect Big Data Technologies. 9. The overall percentage of the world’s total data has been created just within the past two years is ?

A. 80% B. 85% C. 90% D. 95%

Ans : C

Explanation: The overall percentage of the world’s total data has been created just within the past two years is 90%. 10. Apache Kafka is an open-source platform that was created by?

A. LinkedIn B. Facebook

C. Google D. IBM

Ans : A

Explanation: Apache Kafka is an open-source platform that was created by LinkedIn in the year 2011.

11. What was Hadoop named after?

A. Creator Doug Cutting’s favorite circus act B. Cuttings high school rock band C. The toy elephant of Cutting’s son D. A sound Cutting’s laptop made during Hadoop development

Ans : C

Explanation: Doug Cutting, Hadoop creator, named the framework after his child’s stuffed toy elephant. 12. What are the main components of Big Data?

A. MapReduce B. HDFS C. YARN D. All of the above

Ans : D

Explanation: All of the above are the main components of Big Data.

13. Point out the correct statement.

A. Hadoop do need specialized hardware to process the data B. Hadoop 2.0 allows live stream processing of real time data C. In Hadoop programming framework output files are divided into lines or records D. None of the above

Ans : B

Explanation: Hadoop batch processes data distributed over a number of computers ranging in 100s and 1000s. 14. Which of the following fields come under the umbrella of Big Data?

A. Black Box Data B. Power Grid Data

C. Search Engine Data D. All of the above

Ans : D

Explanation: All options are the fields come under the umbrella of Big Data.

15. Which of the following is not an example of Social Media? 1. Twitter 2. Google 3. Instagram 4. Youtube

ANs: 2 (Google)

16. By 2025, the volume of digital data will increase to 1. TB 2. YB 3. ZB 4. EB Ans: 3 ZB

17. Data Analysis is a process of 1. inspecting data 2. cleaning data 3. transforming data 4. All of Above

Ans. 4 All of above

18. Which of the following is not a major data analysis approaches? 1. Data Mining 2. Predictive Intelligence 3. Business Intelligence 4. Text Analytics

Ans. 2 Predictive Intelligence

19. The Process of describing the data that is huge and complex to store and process is known as 1. Analytics 2. Data mining 3. Big data 4. Data warehouse

Ans. 3 Big data

20. In descriptive statistics, data from the entire population or a sample is summarized with ?

1. Integer descriptor 2. floating descriptor 3. numerical descriptor 4. decimal descriptor

Ans. 3 numerical descriptor

21. Data generated from online transactions is one of the example for volume of big data 1. TRUE 2. FALSE

TRUE

22. Velocity is the speed at which the data is processed 1. True 2. False

False

23. Value tells the trustworthiness of data in terms of quality and accuracy 1. TRUE 2. FALSE

False

24. Hortonworks was introduced by Cloudera and owned by Yahoo 1. True 2. False

False

25. ____ refers to the ability to turn your data useful for business 1. Velocity 2. variety 3. Value 4. Volume

Ans. 3 Value

26. Data Analysis is defined by the statistician? 1. William S. 2. Hans Peter Luhn 3. Gregory Piatetsky-Shapiro 4. John Tukey

Ans. 4 John Tukey

27. Files are divided into ____ sized Chunks. 1. Static 2. Dynamic 3. Fixed 4. Variable

Ans. 3 Fixed

28. _____ is an open source framework for storing data and running application on clusters of commodity hardware. 1. HDFS 2. Hadoop 3. MapReduce 4. Cloud

Ans. 2 Hadoop

29. ____ is factors considered before Adopting Big Data Technology 1. Validation 2. Verification 3. Data 4. Design

Ans. 1 Validation

30. Which among the following is not a Data mining and analytical applications? 1. profile matching 2. social network analysis 3. facial recognition 4. Filtering

Ans. 4 Filtering

31. Which storage subsystem can support massive data volumes of increasing size. 1. Extensibility 2. Fault tolerance 3. Scalability 4. High-speed I/O capacity

Ans. 3 Scalability

32. ______ is a programming model for writing applications that can process Big Data in parallel on multiple nodes.

1. HDFS 2. MAP REDUCE 3. HADOOP 4. HIVE Ans. MAP REDUCE

33. How many main statistical methodologies are used in data analysis?

A. 2 B. 3 C. 4 D. 5

Ans : A

Explanation: In data analysis, two main statistical methodologies are used Descriptive statistics and Inferential statistics.

34. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

A. TRUE B. FALSE C. Can be true or false D. Can not say

Ans : A

Explanation: The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

35. The branch of statistics which deals with development of particular statistical methods is classified as 1. industry statistics 2. economic statistics 3. applied statistics 4. applied statistics

Ans. applied statistics

36. Point out the correct statement. a) Descriptive analysis is first kind of data analysis performed b) Descriptions can be generalized without statistical modelling

c) Description and Interpretation are same in descriptive analysis d) None of the mentioned

Answer: b Explanation: Descriptive analysis describe a set of data.

37. What are the five V’s of Big Data?

A. Volume

B. Velocity

C. Variety

D. All the above

Answer: Option D

38. What are the main components of Big Data?

A. MapReduce

B. HDFS

C. YARN

D. All of these

Answer: Option D

39. What are the different features of Big Data Analytics?

A. Open-Source

B. Scalability

C. Data Recovery

D. All the above

Answer: Option D

40. Which of the following refers to the problem of finding abstracted patterns (or structures) in the unlabeled data?

A. Supervised learning

B. Unsupervised learning

C. Hybrid learning

D. Reinforcement learning

Answer: B

Explanation: Unsupervised learning is a type of machine learning algorithm that is generally used to find the hidden structured and patterns in the given unlabeled data.

41. Which one of the following refers to querying the unstructured textual data?

A. Information access

B. Information update

C. Information retrieval

D. Information manipulation

Answer: D

Explanation: Information retrieval refers to querying the unstructured textual data. We can also understand information retrieval as an activity (or process) in which the tasks of obtaining information from system recourses that are relevant to the information required from the huge source of information.

42. For what purpose, the analysis tools pre-compute the summaries of the huge amount of data?

A. In order to maintain consistency

B. For authentication

C. For data access

D. To obtain the queries response

Answer: d

Explanation: Whenever a query is fired, the response of the query would be put very earlier. So, for the query response, the analysis tools pre-compute the summaries of the huge amount of data. To understand it in more details, consider the following example:

43. Which one of the following statements is not correct about the data cleaning?

It refers to the process of data cleaning

It refers to the transformation of wrong data into correct data

It refers to correcting inconsistent data

All of the above

Answer: d

Explanation: Data cleaning is a kind of process that is applied to data set to remove the noise from the data (or noisy data), inconsistent data from the given data. It also involves the process of transformation where wrong data is transformed into the correct data as well. In other words, we can also say that data cleaning is a kind of pre-process in which the given set of data is prepared for the data warehouse.

44. Any data with unknown form or the structure is classified as _ data. a. Structured b. Unstructured c. Semi-structured d. None of above Ans. b

45.____ means relating to the issuing of reports. a. Analysis b. Reporting c. Reporting and Analysis d. None of the above

Ans. b

46.Veracity involves the reliability of the data this is ________due to the numerous data sources of big data a) Easy and difficulty b) Easiness c) Demanding d) none of these

Ans. c 47. ____is a process of defining the measurement of a phenomenon that is not directly measurable, though its existence is implied by other phenomena. a. Data preparation b. Model planning c. Communicating results d. Operationalization

Ans. d

48. _____data is data whose elements are addressable for effective analysis.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. a

49. ______data is information that does not reside in a relational database but that have some organizational properties that make it easier to analyze.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. b

50. ______data is a data which is not organized in a predefined manner or does not have a predefined data model, thus it is not a good fit for a mainstream relational database.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. c

51. There are ___ types of big data.

a. 2 b. 3 c. 4 d. 5

Ans. b

52. Google search is an example of _________ data.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. c

KIETGroupofInstitutions

Department of IT

COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 2 DataAnalytics(KIT601)

1. Maximum aposteriori classifier is also known as: a. Decision tree classifier b. Bayes classifier c. Gaussian classifier d. Maximum margin classifier

Ans. B

2. Which of the following sentence is FALSE regarding regression?

a. It relates inputs to outputs. b. It is used for prediction. c. It may be used for interpretation. d. It discovers causal relationships.

Ans. d

3. Suppose you are working on stock market prediction, and you would like to predict the price of a particular stock tomorrow (measured in dollars).

You want to use a learning algorithm for this.

a. Regression b. Classification c. Clustering d. None of these

Ans. a

4. In binary logistic regression:

a. The dependent variable is divided into two equal subcategories. b. The dependent variable consists of two categories. c. There is no dependent variable. d. The dependent variable is continuous.

Ans. b

5. A fair six-sided die is rolled twice. What is the probability of getting 4 on the first roll and not getting 6 on the second roll?

a. 1/36 b. 5/36 c. 1/12 d. 1/9

Ans. b

6. The parameter β0 is termed as intercept term and the parameter β1 is termed as slope parameter. These parameters are usually called as _________

a Regressionists b. Coefficients c. Regressive d. Regression coefficients

Ans. d

7. ________ is a simple approach to supervised learning. It assumes that the dependence of Y on X1, X2… Xp is linear.

a. Gradient Descent b. Linear regression

c. Logistic regression d. Greedy algorithms

Ans. c

8. What makes the interpretation of conditional effects extra challenging in logistic regression?

a. It is not possible to model interaction effects in logistic regression b. The maximum likelihood estimation makes the results unstable c. The conditional effect is dependent on the values of all X-variables d. The results has to be raised by its natural logarithm.

Ans. c 9. If there were a perfect positive correlation between two interval/ratio variables, the Pearson's r test would give a correlation coefficient of:

a. - 0.328 b. +1 c. +0.328 d. – 1

Ans.b

10. Logistic Regression transforms the output probability to in a range of [0, 1]. Which of the following function is used for this purpose?

a. Sigmoid b. Mode c. Square d. All of these

Ans.a

12. Generally which of the following method(s) is used for predicting continuous dependent variable?

1. Linear Regression 2. Logistic Regression

a. 1 and 2

b. only 1 c. only 2 d. None of these

Ans.b

13. Mean of the set of numbers {1, 2, 3, 4, 5} is?

a. 2 b. 3 c. 4 d. 5

Ans.b

14. Name of a movie, can be considered as an attribute of type?

a. Nominal

b. Ordinal

c. Interval

d. Ratio

Ans.a

15. Let A be an example, and C be a class. The probability P(C) is known as:

a. Apriori probability

b. Aposteriori probability

c. Class conditional probability

d. None of the above

Ans.a

16. Consider two binary attributes X and Y. We know that the attributes are independent and probability P(X=1) = 0.6, and P(Y=0) = 0.4. What is the probability that both X and Y have values 1?

a. 0. 0.06 b. 0.16 c. 0.26 d. 0.36

Ans. d

17. In regression the output is a. Discrete b. Continuous c. Continuous and always lie in same range d. May be discrete and continuous

Ans. b

18. The probabilistic model that finds the most probable prediction using the training data and space of hypotheses to make a prediction for a new data instance.

a. Concept learning b. Bayes optimal classifier c. EM algorithm d. Logistic regression

Ans. b

19 . State whether the following condition is true or not. “In Bayesian theorem , it is important to find the probability of both the events occurring simultaneously”

a. True b. False

Ans. b 20 .If the correlation coefficient is a positive value, then the slope of the regression line

a. can be either negative or positive

b. must also be positive c. can be zero d. cannot be zero

Ans. b

21. Which of the following is true about Naive Bayes?

a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent c. Assumes that all the features in a dataset are equally important and are independent. d. None of the above options

Ans. c

22. Previous probabilities in Bayes Theorem that are changed with help of new available information are classified as _______

a. independent probabilities b. posterior probabilities c. interior probabilities d. dependent probabilities

Ans. b

23. Which of the following methods do we use, to find the best fit line for data in Linear Regression?

a. Least Square Error b. Maximum Likelihood c. Logarithmic Loss d. Both A and B

Ans. a

24. What is the consequence between a node and its predecessors while creating Bayesian network?

a. Conditionally dependent b. Dependent c. Conditionally independent d. Both a & b

Ans. c 25. Bayes rule can be used to __________conditioned on one piece of evidence.

a. Solve queries b. Answer probabilistic queries c. Decrease complexity of queries d. Increase complexity of queries

Ans.b

26. Which of the following options is/are correct in reference to Bayesian Learning?

a. New instances can be classified by combining the predictions of multiple hypotheses, weighted by their probabilities. b. Bayesian methods can accommodate hypotheses that make probabilistic predictions. c. Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. d. All of the mentioned

Ans. d

27. When the cell is said to be fired? a. if potential of body reaches a steady threshold values b. if there is impulse reaction c. during upbeat of heart d. none of the mentioned

Ans.a 28. Which of the following is true about regression analysis?

a. answering yes/no questions about the data b. estimating numerical characteristics of the data c. modeling relationships within the data d. describing associations within the data

Ans.c

29. Suppose you are building a SVM model on data X. The data X can be error prone which means that you should not trust any specific data point too much. Now think that you want to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of its hyper parameter. Based upon that give the answer for following question. What would happen when you use very large value of C?

a. We can still classify data correctly for given setting of hyper parameter C b. We cannot classify data correctly for given setting of hyper parameter C. c. Can’t Say

d. None of these

Ans. a

30. What is/are true about kernel in SVM?

(a) Kernel function map low dimensional data to high dimensional space (b) It’s a similarity function

a. Kernel function map low dimensional data to high dimensional space b. It’s a similarity function c. Kernel function map low dimensional data to high dimensional space and It’s a similarity function d. None of these

Ans. c

31. Suppose you have trained an SVM with linear decision boundary after training SVM, you correctly infer that your SVM model is under fitting. (a) Which of the following option would you more likely to consider iterating SVM next time? Tasks a. You want to increase your data points. b. You want to decrease your data points. c. You will try to calculate more variables. d. You will try to reduce the features.

Ans. c

32. Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?

a. The model would consider even far away points from hyperplane for modeling b. The model would consider only the points close to the hyperplane for modeling. c. The model would not be affected by distance of points from hyperplane for modeling. d. None of these

Ans.b

33. Which of the following can only be used when training data are linearly separable?

a. Linear Logistic Regression. b. Linear Soft margin SVM c. Linear hard-margin SVM d. Parzen windows.

Ans.c

34. Using the kernel trick, one can get non-linear decision boundaries using algorithms designed originally for linear models.

a. True b. False

Ans. a

35. Support vectors are the data points that lie closest to the decision surface.

a. True b. False

Ans. True

36. Which of the following statement is true for a multilayered perceptron?

a. Output of all the nodes of a layer is input to all the nodes of the next layer b. Output of all the nodes of a layer is input to all the nodes of the same layer c. Output of all the nodes of a layer is input to all the nodes of the previous layer d. Output of all the nodes of a layer is input to all the nodes of the output layer

Ans. a

37. Which of the following is/are true regarding an SVM?

a. For two dimensional data points, the separating hyperplane learnt by a linear SVM will be a straight line. b. In theory, a Gaussian kernel SVM cannot model any complex separating hyperplane. c. For every kernel function used in a SVM, one can obtain an equivalent closed form basis expansion. d. Overfitting in an SVM is not a function of number of support vectors.

Ans. a

38. The function of distance that is used to determine the weight of each training example in instance based learning is known as______________

a. Kernel Function b. Linear Function c. Binomial distribution d. All of the above

Ans. a 39. What is the name of the function in the following statement “A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it outputs a 1, otherwise it just outputs a 0”?

a. Step function b. Heaviside function c. Logistic function d. Binary function

Ans. b

40. Which of the following is true? (i) On average, neural networks have higher computational rates than conventional computers. (ii) Neural networks learn by example. (iii) Neural networks mimic the way the human brain works.

a. All of the mentioned are true b. (ii) and (iii) are true c. (i) and (ii) are true d. Only (i) is true

Ans. a

41. Which of the following is an application of NN (Neural Network)?

a. Sales forecasting b. Data validation c. Risk management d. All of the mentioned

Ans. d

42. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it outputs a 1, otherwise it just outputs a 0.

a. True b. False

Ans. a

43. In what ways can output be determined from activation value in ANN?

a. Deterministically

b. Stochastically c. both deterministically & stochastically d. none of the mentioned

Ans. c

45. In ANN, the amount of output of one unit received by another unit depends on what?

a. output unit b. input unit c. activation value d. weight

Ans. d

46. Function of dendrites in ANN is

a. receptors b. transmitter c. both receptor & transmitter d. none of the mentioned

Ans. a

47. Which of the following is true? (i) On average, neural networks have higher computational rates than conventional computers. (ii) Neural networks learn by example. (iii) Neural networks mimic the way the human brain works.

a. All of the mentioned are true b. (ii) and (iii) are true c. (i), (ii) and (iii) are true d. Only (i) is true

Ans. a 48. What is the name of the function in the following statement “A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it outputs a 1, otherwise it just outputs a 0”?

a. Step function b. Heaviside function

c. Logistic function d. Binary function

Ans. b

49. 4 input neuron has weight 1, 2, 3 and 4. The transfer function is linear with the constant of proportionality being equal to 2. The inputs are 4,10,5 and 20 respectively. The output will be

a. 238 b. 76 c. 119 d. 123

Ans. a

50. Which of the following are real world applications of the SVM?

a. Text and Hypertext Categorization b. Image Classification c. Clustering of News Articles d. All of the above

Ans.d

51. Support vector machine may be termed as:

a. Maximum apriori classifier

b. Maximum margin classifier

c. Minimum apriori classifier

d. Minimum margin classifier

Ans.b

52. What is purpose of Axon? a. receptors b. transmitter c. transmission d. none of the mentioned

53. The model developed from sample data having the form of ŷ = b0 + b1X is known as: Ans: - C – estimated regression equation

54. In regression analysis, which of the following is not a required assumption about the error term ε?

Ans: - A – The expected value of the error term is one

55. ____________ are algorithms that learn from their more complex environments (hence eco) to generalize, approximate and simplify solution logic.

a. Fuzzy Relational DB

b. Ecorithms

c. Fuzzy Set

d. None of the mentioned

Ans. c

56. The truth values of traditional set theory is ____________ and that of fuzzy set is __________

a. Either 0 or 1, between 0 & 1

b. Between 0 & 1, either 0 or 1

c. Between 0 & 1, between 0 & 1

d. Either 0 or 1, either 0 or 1

Ans. a

57. What is the form of Fuzzy logic?

a. Two-valued logic

b. Crisp set logic

Ans.c

c. Many-valued logic

d. Binary set logic

Ans. c

58. Fuzzy logic is usually represented as ___________

a. IF-THEN rules

b. IF-THEN-ELSE rules

c. Both IF-THEN-ELSE rules & IF-THEN rules

d. None of the mentioned

Ans. a

59. ______________ is/are the way/s to represent uncertainty.

a. Fuzzy Logic

b. Probability

c. Entropy

d. All of the mentioned

Ans.d

60. Fuzzy Set theory defines fuzzy operators. Choose the fuzzy operators from the following.

a. AND

b. OR

c. NOT

d. All of mentioned

Ans. d

61. The values of the set membership is represented by ___________

a. Discrete Set

b. Degree of truth

c. Probabilities

d. Both Degree of truth & Probabilities View Answer

Ans. b

62. Fuzzy logic is extension of Crisp set with an extension of handling the concept of Partial Truth.

a. True

b. False

Ans. a

SET II

1. Sentiment Analysis is an example of 1. Regression 2. Classification 3. clustering 4. Reinforcement Learning

1. 1, 2 and 4 2. 1, 2 and 3 3. 1 and 3 4. 1 and 2 Show Answer Ans. 1, 2 and 4

2. The self-organizing maps can also be considered as the instance of _________ type of learning.

A. Supervised learning B. Unsupervised learning C. Missing data imputation D. Both A & C

Answer: B Explanation: The Self Organizing Map (SOM), or the Self Organizing Feature Map is a kind of Artificial Neural Network which is trained through unsupervised learning.

3. The following given statement can be considered as the examples of_________

Suppose one wants to predict the number of newborns according to the size of storks' population by performing supervised learning

A. Structural equation modeling B. Clustering C. Regression D. Classification

Answer: C

Explanation: The above-given statement can be considered as an example of regression. Therefore the correct answer is C.

4. In the example predicting the number of newborns, the final number of total newborns can be considered as the _________

A. Features B. Observation C. Attribute

D. Outcome

a. Answer: d b. Explanation: In the example of predicting the total number of newborns, the result will be represented as the outcome. Therefore, the total number of newborns will be found in the outcome or addressed by the outcome.

5. Which of the following statement is true about the classification?

A. It is a measure of accuracy B. It is a subdivision of a set C. It is the task of assigning a classification D. None of the above

Answer: B

Explanation: The term "classification" refers to the classification of the given data into certain sub-classes or groups according to their similarities or on the basis of the specific given set of rules.

6. Which one of the following correctly refers to the task of the classification?

A. A measure of the accuracy, of the classification of a concept that is given by a certain theory B. The task of assigning a classification to a set of examples C. A subdivision of a set of examples into a number of classes D. None of the above

Answer: B

Explanation: The task of classification refers to dividing the set into subsets or in the numbers of the classes. Therefore the correct answer is C.

7. _____is an observation which contains either very low value or very high value in comparison to other observed values. It may hamper the result, so it should be avoided. a. Dependent Variable b. Independent Variable c. Outlier Variable d. None of the above Ans. c

8. _______is a type of regression which models the non-linear dataset using a linear model.

a. Polynomial Regression b. Logistic Regression c. Linear Regression d. Decision Tree Regression

Ans. a

9. The prediction of the weight of a person when his height is known, is a simple example of regression. The function used in R language is_____.

a. Im() b. print() c. predict() d. summary( )

Ans. c

10. There is the following syntax of lm() function in multiple regression.

lm(y ~ x1+x2+x3...., data) a. y is predictor and x1,x2,x3 are the dependent variables. b. y is dependent and x1,x2,x3 are the predictors. c. data is predictor variable. d. None of the above.

Ans. b

11. _______is a probabilistic graphical model which represents a set of variables and their conditional dependencies using a directed acyclic graph.

a. A Bayesian network b. Bayes Network c. Bayesian Model d. All of the above

Ans. d

12. In support vector regression, _____is a function used to map lower dimensional data into higher dimensional data

A) Boundary line B) Kernel C) Hyper Plane D) Support Vector Ans. B

13. If the independent variables are highly correlated with each other than other variables then such condition is called___________ a) outlier b) Multicollinearity c) under fitting d) independent variable

Ans. b

14. The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a ____ or_____.

a. Directed Acyclic Graph or DAG b. Directed Cyclic Graph or DCG. c. Both the above. d. None of the above.

Ans. a

15. The hyperplane with maximum margin is called the ______ hyperplane. a. Non-optimal b. Optimal c. None of the above d. Requires one more option

Ans. b

16. One more _____ is needed for non-linear SVM.

a. Dimension b. Attribute c. Both the above d. None of the above

Ans. a

17. A subset of dataset to train the machine learning model, and we already know the output.

a. Training set b. Test set c. Both the above

d. None of the above

Ans. a

18. ______is the final step of data preprocessing in machine learning. It is a technique to standardize the independent variables of the dataset in a specific range. In_____, we put our variables in the same range and in the same scale so that no any variable dominate the other variable

a. Feature Sampling b. Feature Scaling c. None of the above d. Both the above

Ans. b

19. Principal components analysis (PCA) is a statistical technique that allows identifying underlying linear patterns in a data set so it can be expressed in terms of other data set of a significantly ____ dimension without much loss of information. a. Lower b. Higher c. Equal d. None of the above

Ans. a

20. _____ units which are internal to the network and do not directly interact with the environment. a. Input b. Output c. Hidden d. None of the above

Ans. c

21. In a ____ network there is an ordering imposed on the nodes in a network: if there is a connection from unit a to unit b then there can-not be a connection from b to a. a. Feedback b. Feed-Forward c. None of the above

Ans. b

22. _____ contains the multiple logical values and these values are the truth values of a variable or problem between 0 and 1. This concept was introduced by Lofti Zadeh in 1965 a. Boolean Logic b. Fuzzy Logic c. None of the above

Ans. b

23. ______is a module or component, which takes the fuzzy set inputs generated by the Inference Engine, and then transforms them into a crisp value. a. Fuzzification b. Defuzzification c. Inference Engine d. None of the above

Ans. b

24. The most common application of time series analysis is forecasting future values of a numeric value using the ______ structure of the ____ a. Shares,data b. Temporal,data c. Permanent,data d. None of these

Ans. b

25. Identify the component of a time series a. Temporal b. Shares c. Trend d. Policymakers

Ans. c

26. Predictable pattern that recurs or repeats over regular intervals. Seasonality is often observed within a year or less: This define the term__________ a. Trend b. Seasonality c. Cycles d. Recession

Ans. b

27. ________Learning uses a training set that consists of a set of pattern pairs: an input pattern and the corresponding desired (or target) output pattern. The desired output may be regarded as the ‘network’s ‘teacher” for that input a. Unsupervised b. Supervised c. Modular d. Object

Ans. b

28. The _______ perceptron consists of a set of input units connected by a single layer of weights to a set of output units a. Multi layer b. Single layer c. Hidden layer d. None of these

Ans. b

29. If we add another layer of weights to single layer perceptron , then we find that there is a new set of units that are neither input or output units; for simplicity we consider more than 2 layers has a. Single layer perceptron b. Multi layer perceptron c. Hidden layer d. None of these

Ans. b

30. Patterns that repeat over a certain period of time a. Seasonal b. Trend c. None of the above d. Both of the above

Ans. a

31. Which of the following is characteristic of best machine learning method ?

a. Fast b. Accuracy c. Scalable d. All of the Mentioned

Ans. d

32. Supervised learning differs from unsupervised clustering in that supervised learning requires a. at least one input attribute. b. input attributes to be categorical. c. at least one output attribute. d. ouput attriubutes to be categorical. Ans. d

33. Supervised learning and unsupervised clustering both require at least one a. hidden attribute. b. output attribute. c. input attribute. d. categorical attribute. Ans. c

34. Which statement is true about prediction problems? a. The output attribute must be categorical. b. The output attribute must be numeric. c. The resultant model is designed to determine future outcomes. d. The resultant model is designed to classify current behavior. Ans. c

35. Which statement is true about neural network and linear regression models? a. Both models require input attributes to be numeric. b. Both models require numeric attributes to range between 0 and 1. c. The output of both models is a categorical attribute value. d. Both techniques build models whose output is determined by a linear sum of weighted input attribute values. Ans. a

36. A feed-forward neural network is said to be fully connected when a. all nodes are connected to each other. b. all nodes at the same layer are connected to each other. c. all nodes at one layer are connected to all nodes in the next higher layer. d. all hidden layer nodes are connected to all output layer nodes. Ans. c

37. Machine learning techniques differ from statistical techniques in that machine learning methods a. typically assume an underlying distribution for the data.

b. are better able to deal with missing and noisy data. c. are not able to explain their behavior. d. have trouble with large-sized datasets. Ans. b

38. This supervised learning technique can process both numeric and categorical input attributes. a. linear regression b. Bayes classifier c. logistic regression d. backpropagation learning Ans. b

39. This technique associates a conditional probability value with each data instance. a. linear regression b. logistic regression c. simple regression d. multiple linear regression Ans. b

40. Logistic regression is a ________ regression technique that is used to model data having a _____outcome. a. linear, numeric b. linear, binary c. nonlinear, numeric d. nonlinear, binary Ans. d

41. Which of the following problems is best solved using time-series analysis? a. Predict whether someone is a likely candidate for having a stroke. b. Determine if an individual should be given an unsecured loan. c. Develop a profile of a star athlete. d. Determine the likelihood that someone will terminate their cell phone contract.

Ans. d

42. Which of the following is true about Naive Bayes? a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent

c. Both A and B d. None of the above options Ans. c 43. Simple regression assumes a __________ relationship between the input attribute and output attribute. a. linear b. quadratic c. reciprocal d. inverse

44. With Bayes classifier, missing data items are a. treated as equal compares. b. treated as unequal compares. c. replaced with a default value. d. ignored. 45. What is Machine learning? a. The autonomous acquisition of knowledge through the use of computer programs b. The autonomous acquisition of knowledge through the use of manual programs c. The selective acquisition of knowledge through the use of computer programs d. The selective acquisition of knowledge through the use of manual programs

Ans: a

46. Automated vehicle is an example of ______ a. Supervised learning b. Unsupervised learning c. Active learning d. Reinforcement learning

Ans: a

47. Multilayer perceptron network is a. Usually, the weights are initially set to small random values b. A hard-limiting activation function is often used c. The weights can only be updated after all the training vectors have been presented d. Multiple layers of neurons allow for less complex decision boundaries than a single layer

Ans: a

48. Neural networks a. optimize a convex cost function b. cannot be used for regression as well as classification c. always output values between 0 and 1 d. can be used in an ensemble

Ans: d

49. In neural networks, nonlinear activation functions such as sigmoid, tanh, and ReLU a. speed up the gradient calculation in backpropagation, as compared to linear units b. are applied only to the output units c. help to learn nonlinear decision boundaries d. always output values between 0 and 1

Ans: c

50. Which of the following is a disadvantage of decision trees?

a. Factor analysis b. Decision trees are robust to outliers c. Decision trees are prone to be overfit d. None of the above

Ans: c

51. Back propagation is a learning technique that adjusts weights in the neural network by propagating weight changes. a. Forward from source to sink b. Backward from sink to source c. Forward from source to hidden nodes d. Backward from sink to hidden nodes

Ans: b

52. Identify the following activation function : φ(V) = Z + (1/ 1 + exp (– x * V + Y) ),Z, X, Y are parameters

a. Step function b. Ramp function c. Sigmoid function

d. Gaussian function

Ans: c

53. An artificial neuron receives n inputs x1, x2, x3............xnwith weights w1, w2, ..........wn attached to the input links. The weighted sum_________________ is computed to be passed on to a non-linear filter Φ called activation function to release the output. a. Σ wi b. Σ xi c. Σ wi + Σ xi d. Σ wi* xi

Ans: d

54. With Bayes classifier, missing data items are a. treated as equal compares. b. treated as unequal compares. c. replaced with a default value. d. ignored.

Ans:b

55. Machine learning techniques differ from statistical techniques in that machine learning methods a. typically assume an underlying distribution for the data. b. are better able to deal with missing and noisy data. c. are not able to explain their behavior. d. have trouble with large-sized datasets.

Ans: b

56. Which of the following is true about Naive Bayes?

a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent c. Both a and b d. None of the above options

Ans: c

57. How many terms are required for building a Bayes model?

a. 1 b. 2 c. 3 d. 4

Ans: c

58. What does the Bayesian network provides? a. Complete description of the domain b. Partial description of the domain c. Complete description of the problem d. None of the mentioned

Ans: a

59. How the Bayesian network can be used to answer any query? a. Full distribution b. Joint distribution c. Partial distribution d. All of the mentioned

Ans: b

60. In which of the following learning the teacher returns reward and punishment to learner? a. Active learning b. Reinforcement learning c. Supervised learning d. Unsupervised learning

Ans: b

61. Which of the following is the model used for learning? a. Decision trees b. Neural networks c. Propositional and FOL rules d. All of the mentioned

Ans: d

KIETGroupofInstitutions

Department of IT

COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 3 DataAnalytics(KIT601)

Q.1 Which attribute is _not_ indicative for data streaming?

A) Limited amount of memory

B) Limited amount of processing time

C) Limited amount of input data

D) Limited amount of processing power

Q.2 Which of the following statements about data streaming is true?

A) Stream data is always unstructured data.

B) Stream data often has a high velocity.

C) Stream elements cannot be stored on disk.

D) Stream data is always structured data.

Ans. B

Q.3 What is the main difference between standard reservoir sampling and min-wise sampling?

A) Reservoir sampling makes use of randomly generated numbers whereas minwise sampling does not.

B) Min-wise sampling makes use of randomly generated numbers whereas reservoir sampling does not.

C) Reservoir sampling requires a stream to be processed sequentially, whereas minwise does not.

D) For larger streams, reservoir sampling creates more accurate samples than minwise sampling.

Ans. C)

Q.4 A Bloom filter guarantees no

A) false positives

B) false negatives

C) false positives and false negatives

D) false positives or false negatives, depending on the Bloom filter type

Ans. B)

Q,5 Which of the following statements about standard Bloom filters is correct?

A) It is possible to delete an element from a Bloom filter.

B) A Bloom filter always returns the correct result.

C) It is possible to alter the hash functions of a full Bloom filter to create more

space.

Incorrect.

D) A Bloom filter always returns TRUE when testing for a previously added

element.

Ans. D)

Q.6 The DGIM algorithm was developed to estimate the counts of 1's occur within the last k bits of a stream window N. Which of the following statements is true about the estimate of the number of 0's based on DGIM?

A) The number of 0's cannot be estimated at all.

B) The number of 0's can be estimated with a maximum guaranteed error.

C) To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's.

D) None of these

Ans. B)

Q.7 Which of the following statements about the standard DGIM algorithm are false?

A)DGIM operates on a time-based window.

B) DGIM reduces memory consumption through a clever way of storing counts.

C) In DGIM, the size of a bucket is always a power of two.

D) The maximum number of buckets has to be chosen beforehand.

Ans. D)

Q.8 Which of the following statements about the standard DGIM algorithm are false?

A)DGIM operates on a time-based window.

B) DGIM reduces memory consumption through a clever way of storing counts.

C) In DGIM, the size of a bucket is always a power of two.

D) The buckets contain the count of 1's and each 1's specific position in the stream

Ans. D)

Q.9 What are DGIM’s maximum error boundaries? A) DGIM always underestimates the true count; at most by 25%

B) DGIM either underestimates or overestimates the true count; at most by 50%

C) DGIM always overestimates the count; at most by 50%

D) DGIM either underestimates or overestimates the true count; at most by 25%

Ans. B)

Q.10 Which algorithm should be used to approximate the number of distinct elements in a data stream?

A) Misra-Gries

B) Alon-Matias-Szegedy

C) DGIM

D) None of the above

Ans. D)

Q.11 Which algorithm should be used to approximate the number of distinct elements in a data stream?

A) Misra-Gries

B) Alon-Matias-Szegedy

C) DGIM

D) Flajolet and Martin

Ans. D)

Q.12 Which of the following streaming windows show valid bucket representations according to the DGIM rules?

A) 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 1

B) 1 0 1 1 1 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 1

C) 1 1 1 1 0 0 1 1 1 0 1 0 1

D) 1 0 1 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1

Ans. D)

Q.13 For which of the following streams is the second-order moment F2 greater than 45?

A) 10 5 5 10 10 10 1 1 1 10

B) 10 10 10 10 10 5 5 5 5 5

C) 1 1 1 1 1 5 10 10 5 1

D) None of these

Ans. B)

Q.14 For which of the following streams is the second-order moment F2 greater than 45?

A) 10 5 5 10 10 10 1 1 1 10

B) 10 10 10 10 10 10 10 10 10 10

C) 1 1 1 1 1 5 10 10 5 1

D) None of these

Ans. B)

Q 15 : In Bloom filter an array of n bits is initialized with

A) all 0s

B) all 1s

C) half 0s and half 1s

D) all -1

Ans. A)

Q 16. Pick a hash function h that maps each of the N elements to at least log2 N bits, Estimated number of distinct elements is

A) 2^R

B) 2^(-R)

C) 1-(2^R)

D) 1-(2^(-R))

Ans. A)

Q.17 Sliding window operations typically fall in the category

A) OLTP Transactions

B) Big Data Batch Processing

C) Big Data Real Time Processing

D) Small Batch Processing

Ans. C)

Q.18 What is the finally produced by Hierarchical Agglomerative Clustering?

A) final estimate of cluster centroids

B)assignment of each point to clusters

C) tree showing how close things are to each other

D) Group of clusters

Ans. C)

Q19 Which of the algorithm can be used for counting 1's in a stream

A) FM Algorithm

B) PCY Algorithm

C) DGIM Algorithm

D) SON Algorithm

Ans. C)

Q20 Which technique is used to filter unnecessary itemset in PCY algorithm

A) Association Rule

B) Hashing Technique

C) Data Mining

D) Market basket

Ans. B)

Q21 In association rule, which of the following indicates the measure of how frequently the items occur in a dataset ?

A) Support B) Confidence C) Basket D) Itemset

Ans. A)

Q.22 which of the following clustering technique is used by K- Means Algorithm

A) Hierarchical Technique

B) Partitional technique

C)Divisive

D) Agglomerative

Ans. B)

Q.23 which of the following clustering technique is used by Agglomerative Nesting Algorithm

A) Hierarchical Technique

B) Partitional technique

C) Density based

D)None of these

Q24. Which of the following Hierarchichal approach begins with each observation in a distinct (singleton) cluster, and successively merges clusters together until a stopping criterion is satisfied.

A) Divisive

B) Agglomerative

C) Single Link

D) Complete Link

Q.25 Park, Chen, Yu algorithm is useful for __________in Big Data Application.

A) Find Frequent Itemset

B) Filtering Stream

C) Distinct Element Find

D) None of these

Ans. A)

Q.26 .Match the following

a) Bloom filter i) Frequent Pattern Mining

b) FM Algorithm ii) Filtering Stream

c) PCY Algorithm iii) Distinct Element Find d) DGIM Algorithm iv) Counting 1’s in window A a)-ii), b-iii), c-i), d-iv)

B a)-iii), b-ii), c-i), d-iv)

C) A a)-i1), b-iii), c-ii), d-iv)

D) None of these

Ans. A)

SET II

1. Which of the following can be considered as the correct process of Data Mining? a. Infrastructure, Exploration, Analysis, Interpretation, Exploitation b. Exploration, Infrastructure, Analysis, Interpretation, Exploitation c. Exploration, Infrastructure, Interpretation, Analysis, Exploitation d. Exploration, Infrastructure, Analysis, Exploitation, Interpretation

Answer: a

Explanation: The process of data mining contains many sub-processes in a specific order. The correct order in which all sub-processes of data mining executes is Infrastructure, Exploration, Analysis, Interpretation, and Exploitation.

2. Which of the following is an essential process in which the intelligent methods are applied to extract data patterns? a. Warehousing b. Data Mining c. Text Mining d. Data Selection

Answer: b

Explanation: Data mining is a type of process in which several intelligent methods are used to extract meaningful data from the huge collection (or set) of data.

3. What are the functions of Data Mining? a. Association and correctional analysis classification b. Prediction and characterization

c. Cluster analysis and Evolution analysis d. All of the above

Answer: d

Explanation: In data mining, there are several functionalities used for performing the different types of tasks. The common functionalities used in data mining are cluster analysis, prediction, characterization, and evolution. Still, the association and correctional analysis classification are also one of the important functionalities of data mining.

4. Which attribute is _not_ indicative for data streaming?

a. Limited amount of memory b. Limited amount of processing time c. Limited amount of input data d. Limited amount of processing power

Ans. c

5. Which of the following statements about data streaming is true?

a. Stream data is always unstructured data. b. Stream data often has a high velocity. c. Stream elements cannot be stored on disk. d. Stream data is always structured data.

Ans. b

6. Which of the following statements about sampling are correct? a. Sampling reduces the amount of data fed to a subsequent data mining algorithm b. Sampling reduces the diversity of the data stream c. Sampling increases the amount of data fed to a data mining algorithm d. Sampling algorithms often need multiple passes over the data

Ans. a

7. Which of the following statements about sampling are correct? a. Sampling reduces the diversity of the data stream

b. Sampling increases the amount of data fed to a data mining algorithm c. Sampling algorithms often need multiple passes over the data d. Sampling aims to keep statistical properties of the data intact

Ans. d

8. What is the main difference between standard reservoir sampling and min-wise sampling?

a. Reservoir sampling makes use of randomly generated numbers whereas min-wise sampling does not. b. Min-wise sampling makes use of randomly generated numbers whereas reservoir sampling does not. c. Reservoir sampling requires a stream to be processed sequentially, whereas min-wise does not. d. For larger streams, reservoir sampling creates more accurate samples than min-wise sampling.

Ans. c

9. A Bloom filter guarantees no

a. false positives b. false negatives c. false positives and false negatives d. false positives or false negatives, depending on the Bloom filter type

Ans. b

10. Which of the following statements about standard Bloom filters is correct?

a. It is possible to delete an element from a Bloom filter. b. A Bloom filter always returns the correct result. c. It is possible to alter the hash functions of a full Bloom filter to create more space. d. A Bloom filter always returns TRUE when testing for a previously added element.

Ans. d

11. The FM-sketch algorithm uses the number of zeros the binary hash value ends in to make an estimation. Which of the following statements is true about the hash tail?

a. Any specific bit pattern is equally suitable to be used as hash tail.

b. Only bit patterns with more 0's than 1's are equally suitable to be used as hash tails. c. Only the bit patterns 0000000..00 (list of 0s) or 111111..11 (list of 1s) are suitable hash tails. d. Only the bit pattern 0000000..00 (list of 0s) is a suitable hash tail.

Ans. a

12. The FM-sketch algorithm can be used to:

a. Estimate the number of distinct elements. b. Sample data with a time-sensitive window. c. Estimate the frequent elements. d. Determine whether an element has already occurred in previous stream data.

Ans. a

13. The DGIM algorithm was developed to estimate the counts of 1's occur within the last kk bits of a stream window NN. Which of the following statements is true about the estimate of the number of 0's based on DGIM?

a. The number of 0's cannot be estimated at all. b. The number of 0's can be estimated with a maximum guaranteed error. c. To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's. d. None of above

Ans. b

14. Which of the following statements about the standard DGIM algorithm are false? a. DGIM operates on a time-based window b. DGIM reduces memory consumption through a clever way of storing counts c. In DGIM, the size of a bucket is always a power of two d. The maximum number of buckets has to be chosen beforehand. Ans. d

15. Which of the following statements about the standard DGIM algorithm are false? a. DGIM operates on a time-based window b. The buckets contain the count of 1's and each 1's specific position in the stream c. DGIM reduces memory consumption through a clever way of storing counts

d. In DGIM, the size of a bucket is always a power of two Ans. b

16. What are DGIM’s maximum error boundaries?

a. DGIM always underestimates the true count; at most by 25% b. DGIM either underestimates or overestimates the true count; at most by 50% c. DGIM always overestimates the count; at most by 50% d. DGIM either underestimates or overestimates the true count; at most by 25%

Ans. b

17. Which algorithm should be used to approximate the number of distinct elements in a data stream?

a. Misra-Gries b. Alon-Matias-Szegedy c. DGIM d. None of the above

Ans. d

18. Which of the following statements about Bloom filters are correct?

a. A Bloom filter has the same properties as a standard Hashmap data structure in Java (java.util.HashMap). b. A Bloom filter is full if no more hash functions can be added to it. c. A Bloom filter always returns FALSE when testing for an element that was not previously added d. A Bloom filter always returns TRUE when testing for a previously added element

Ans. d

19. Which of the following statements about Bloom filters are correct?

a. An empty Bloom filter (no elements added to it) will always return FALSE when testing for an element b. A Bloom filter has the same properties as a standard Hashmap data structure in Java (java.util.HashMap). c. A Bloom filter is full if no more hash functions can be added to it.

d. A Bloom filter always returns FALSE when testing for an element that was not previously added Ans. a

20. Which of the following streaming windows show valid bucket representations according to the DGIM rules?

a. 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 1 b. 1 0 1 1 1 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 1 c. 1 1 1 1 0 0 1 1 1 0 1 0 1 d. 1 0 1 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1

Ans. d

For which of the following streams is the second-order moment F2F2 greater than 45? 10 5 5 10 10 10 1 1 1 10 ✗ 10 10 10 10 10 5 5 5 5 5 ✗ This option is correct. 1 1 1 1 1 5 10 10 5 1 ✓ 10 10 10 10 10 10 10 10 10 10 ✗ This option is correct.

What is the space complexity of the FREQUENT algorithm? Recall that it aims to find all elements in a sequence whose frequency exceeds 1k1k of the total count. In the equations below, nn is the maximum value of each key and mm is the maximum value of each counter.

a. O(k(logm+logn))

Correct!

b. o(k(logm+logn)) c. O(logk(m+n)) d. o(logk(m+n))

Suppose that to get some information about something, you write a keyword in Google search. Google's analytical tools will then pre-compute large amounts of data to provide a quick output related to the keywords you have written.

19) Which of the following statements is correct about data mining?

a. It can be referred to as the procedure of mining knowledge from data b. Data mining can be defined as the procedure of extracting information from a set of the data c. The procedure of data mining also involves several other processes like data cleaning, data transformation, and data integration d. All of the above

Answer: d

Explanation: The term data mining can be defined as the process of extracting information from the massive collection of data. In other words, we can also say that data mining is the procedure of mining useful knowledge from a huge set of data.

25) The classification of the data mining system involves:

a. Database technology b. Information Science c. Machine learning d. All of the above

Answer: d

Explanation: Generally, the classification of a data mining system depends on the following criteria: Database technology, machine learning, visualization, information science, and several other disciplines.

27) The issues like efficiency, scalability of data mining algorithms comes under_______

a. Performance issues b. Diverse data type issues c. Mining methodology and user interaction d. All of the above

Answer: a

Explanation: In order to extract information effectively from a huge collection of data in databases, the data mining algorithm must be efficient and scalable. Therefore the correct answer is A.

KIETGroupofInstitutions

Department of IT

COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 4 DataAnalytics(KIT601)

1. What does Apriori algorithm do? a. It mines all frequent patterns through pruning rules with lesser support b. It mines all frequent patterns through pruning rules with higher support c. Both a and b d. None of these

Ans. a

2. What techniques can be used to improve the efficiency of apriori algorithm? a. hash based techniques b. transaction reduction c. Partitioning d. All of these

Ans.d 3. What do you mean by support (A)?

a. Total number of transactions containing A b. Total Number of transactions not containing A c. Number of transactions containing A / Total number of transactions d. Number of transactions not containing A / Total number of transactions

Ans. c 4. Which of the following is direct application of frequent itemset mining? a. Social Network Analysis b. Market Basket Analysis c. outlier detection

d. intrusion detection

Ans. b 5. When do you consider an association rule interesting? a. If it only satisfies min_support b. If it only satisfies min_confidence c. If it satisfies both min_support and min_confidence d. There are other measures to check so

Ans. c

6. What is the difference between absolute and relative support? a. Absolute -Minimum support count threshold and Relative-Minimum support threshold b. Absolute-Minimum support threshold and Relative-Minimum support count threshold c. Both a and b d. None of these

Ans. a

7. What is the relation between candidate and frequent itemsets?

a. A candidate itemset is always a frequent itemset b. A frequent itemset must be a candidate itemset c. No relation between the two d. None of these

Ans. b

8. What is the principle on which Apriori algorithm work?

a. If a rule is infrequent, its specialized rules are also infrequent b. If a rule is infrequent, its generalized rules are also infrequent c. Both a and b d. None of these

Ans. a

9. Which of these is not a frequent pattern mining algorithm a. Apriori b. FP growth c. Decision trees d. Eclat

Ans. c

10. What are closed frequent itemsets?

a. A closed itemset b. A frequent itemset c. An itemset which is both closed and frequent d. None of these

Ans. c

11. What are maximal frequent itemsets? a. A frequent item set whose no super-itemset is frequent b. A frequent itemset whose super-itemset is also frequent c. Both a and b d. None of these

Ans. a

12. What is association rule mining?

a. Same as frequent itemset mining b. Finding of strong association rules using frequent itemsets c. Both a and b d. None of these

Ans. b

13. What is frequent pattern growth?

a. Same as frequent itemset mining b. Use of hashing to make discovery of frequent itemsets more efficient c. Mining of frequent itemsets without candidate generation d. None of these

Ans. c

14. When is sub-itemset pruning done?

a. A frequent itemset ‘P’ is a proper subset of another frequent itemset ‘Q’ b. Support (P) = Support(Q) c. When both a and b is true d. When a is true and b is not

Ans. c

15. Our use of association analysis will yield the same frequent itemsets and strong association rules whether a specific item occurs once or three times in an individual transaction

a. TRUE b. FALSE c. Both a and b d. None of these

Ans. a

16. The number of iterations in apriori __

a. increases with the size of the data b. decreases with the increase in size of the data c. increases with the size of the maximum frequent set d. decreases with increase in size of the maximum frequent set

Ans. c

17. Frequent item sets is a. Superset of only closed frequent item sets b. Superset of only maximal frequent item sets c. Subset of maximal frequent item sets d. Superset of both closed frequent item sets and maximal frequent item sets

Ans. c

18. Significant Bottleneck in the Apriori algorithm is a. Finding frequent itemsets b. pruning c. Candidate generation d. Number of iterations

Ans. c

19. Which Association Rule would you prefer a. High support and medium confidence b. High support and low confidence c. Low support and high confidence d. Low support and low confidence

Ans. c

20. The apriori property means a. If a set cannot pass a test, its supersets will also fail the same test b. To decrease the efficiency, do level-wise generation of frequent item sets c. To improve the efficiency, do level-wise generation of frequent item sets d. If a set can pass a test, its supersets will fail the same test

Ans. c

21. To determine association rules from frequent item sets a. Only minimum confidence needed b. Neither support not confidence needed c. Both minimum support and confidence are needed d. Minimum support is needed

Ans. c

22. A collection of one or more items is called as _____

( a ) Itemset ( b ) Support ( c ) Confidence ( d ) Support Count Ans. a

23. Frequency of occurrence of an itemset is called as _____

(a) Support (b) Confidence (c) Support Count (d) Rules Ans. c

24. An itemset whose support is greater than or equal to a minimum support threshold is ______

(a) Itemset (b) Frequent Itemset (c) Infrequent items (d) Threshold values

Ans. b

25. The goal of clustering is to- a. Divide the data points into groups b. Classify the data point into different classes c. Predict the output values of input data points d. All of the above

Ans. a

26. Clustering is a- a. Supervised learning b. Unsupervised learning c. Reinforcement learning d. None Ans. b 27. Which of the following clustering algorithms suffers from the problem of convergence at local optima? a. K- Means clustering b. Hierarchical clustering c. Diverse clustering d. All of the above Ans. d

28. Which version of the clustering algorithm is most sensitive to outliers? a. K-means clustering algorithm b. K-modes clustering algorithm c. K-medians clustering algorithm d. None

Ans. a 29. Which of the following is a bad characteristic of a dataset for clustering analysis-

a. Data points with outliers b. Data points with different densities c. Data points with non-convex shapes d. All of the above Ans. d

30. For clustering, we do not require- a. Labeled data b. Unlabeled data c. Numerical data d. Categorical data

Ans. a 31. The final output of Hierarchical clustering is- a. The number of cluster centroids b. The tree representing how close the data points are to each other c. A map defining the similar data points into individual groups d. All of the above Ans. b

32. Which of the step is not required for K-means clustering?

a. a distance metric b. initial number of clusters c. initial guess as to cluster centroids d. None Ans. d

33. Which of the following uses merging approach? a. Hierarchical clustering b. Partitional clustering c. Density-based clustering d. All of the above Ans. a 34. When does k-means clustering stop creating or optimizing clusters? a. After finding no new reassignment of data points b. After the algorithm reaches the defined number of iterations c. Both A and B d. None Ans. c 35. Which of the following clustering algorithm follows a top to bottom approach? a. K-means b. Divisible c. Agglomerative d. None Ans. b 36. Which algorithm does not require a dendrogram? a. K-means b. Divisible c. Agglomerative d. None

Ans. a 37. What is a dendrogram?

a. A hierarchical structure b. A diagram structure c. A graph structure d. None

Ans. a

38. Which one of the following can be considered as the final output of the hierarchal type of clustering? a. A tree which displays how the close thing are to each other b. Assignment of each point to clusters c. Finalize estimation of cluster centroids d. None of the above

Ans. a

39. Which one of the following statements about the K-means clustering is incorrect?

a. The goal of the k-means clustering is to partition (n) observation into (k) clusters b. K-means clustering can be defined as the method of quantization c. The nearest neighbor is the same as the K-means d. All of the above

Ans. c

40. The self-organizing maps can also be considered as the instance of _________ type of learning.

a. Supervised learning b. Unsupervised learning c. Missing data imputation d. Both A & C

Ans. b

41. Euclidean distance measure is can also defined as ___________

a. The process of finding a solution for a problem simply by enumerating all possible solutions according to some predefined order and then testing them

b. The distance between two points as calculated using the Pythagoras theorem c. A stage of the KDD process in which new data is added to the existing selection. d. All of the above

Ans. c

42. Which of the following refers to the sequence of pattern that occurs frequently?

a. Frequent sub-sequence b. Frequent sub-structure c. Frequent sub-items d. All of the above

Ans. a 43. Which method of analysis does not classify variables as dependent or independent? a) Regression analysis b) Discriminant analysis c) Analysis of variance d) Cluster analysis Answer: (d)

1. The Process of describing the data that is huge and complex to store and process is known as

a. Analytics b. Data mining c. Big Data d. Data Warehouse

Ans C

2. Data generated from online transactions is one of the example for volume of big data. Is this true or False. a. TRUE b. FALSE

Ans. a 3. Velocity is the speed at which the data is processed

a. TRUE b. FALSE

Ans. b

4. _____________ have a structure but cannot be stored in a database.

a. Structured b. Semi-Structured c. Unstructured d. None of these

Ans. b 5. ____________refers to the ability to turn your data useful for business.

a. Velocity b. Variety c. Value d. Volume

Ans. C

6. Value tells the trustworthiness of data in terms of quality and accuracy.

a. TRUE b. FALSE

Ans. b 7. GFS consists of a ____________ Master and ___________ Chunk Servers a. Single, Single b. Multiple, Single c. Single, Multiple

d. Multiple, Multiple

Ans. c

8. Files are divided into ____________ sized Chunks. a. Static b. Dynamic c. Fixed d. Variable Ans. c

9. ____________is an open source framework for storing data and running application on clusters of commodity hardware. a. HDFS b. Hadoop c. MapReduce d. Cloud Ans. B

10. HDFS Stores how much data in each clusters that can be scaled at any time? a. 32 b. 64 c. 128 d. 256 Ans. c

11. Hadoop MapReduce allows you to perform distributed parallel processing on large volumes of data quickly and efficiently... is this MapReduce or Hadoop... i.e statement is True or False a. TRUE b. FALSE Ans. a

12. Hortonworks was introduced by Cloudera and owned by Yahoo. a. TRUE b. FALSE Ans. b

13. Hadoop YARN is used for Cluster Resource Management in Hadoop Ecosystem. a. TRUE b. FALSE Ans. a

14. Google Introduced MapReduce Programming model in 2004. a. TRUE b. FALSE Ans. A

15.______________ phase sorts the data & ____________creates logical clusters. a. Reduce, YARN b. MAP, YARN c. REDUCE, MAP d. MAP, REDUCE Ans. d

16. There is only one operation between Mapping and Reducing is it True or False...

a. TRUE b. FALSE

Ans. A

17. __________ is factors considered before Adopting Big Data Technology. a. Validation b. Verification c. Data d. Design Ans. a

18. _________ for improving supply chain management to optimize stock management, replenishment, and forecasting; a. Descriptive b. Diagnostic c. Predictive d. Prescriptive Ans. c

19. which among the following is not a Data mining and analytical applications? a. profile matching b. social network analysis c. facial recognition d. Filtering Ans. d

20. ________________ as a result of data accessibility, data latency, data availability, or limits on bandwidth in relation to the size of inputs. a. Computation-restricted throttling b. Large data volumes c. Data throttling d. Benefits from data parallelization Ans. c

21. As an example, an expectation of using a recommendation engine would be to increase same-customer sales by adding more items into the market basket. a. Lowering costs b. Increasing revenues c. Increasing productivity d. Reducing risk Ans. b

22. Which storage subsystem can support massive data volumes of increasing size. a. Extensibility b. Fault tolerance c. Scalability d. High-speed I/O capacity Ans. c

23. ______________provides performance through distribution of data and fault tolerance through replication a. HDFS b. PIG c. HIVE d. HADOOP

Ans. a

24. ______________ is a programming model for writing applications that can process Big Data in parallel on multiple nodes. a. HDFS b. MAP REDUCE c. HADOOP d. HIVE Ans. b

25. _____________________ takes the grouped key-value paired data as input and runs a Reducer function on each one of them. a. MAPPER b. REDUCER c. COMBINER d. PARTITIONER Ans. b

26. _______________ is a type of local Reducer that groups similar data from the map phase into identifiable sets. a. MAPPER b. REDUCER c. COMBINER d. PARTITIONER. Ans. c

27. MongoDB is __________________ a. Column Based b. Key Value Based c. Document Based d. Graph Based Ans. c

28. ____________ is the process of storing data records across multiple machines a. Sharding b. HDFS c. HIVE d. HBASE Ans. a

29. The results of a hive query can be stored as a. Local File b. HDFS File c. Both d. Cannot be stored Ans. c 30. The position of a specific column in a Hive table a. can be anywhere in the table creation clause b. must match the position of the corresponding data in the data file c. Must match the position only for date time data type in the data file d. Must be arranged alphabetically Ans. b 31. The Hbase tables are A. Made read only by setting the read-only option B. Always writeable

C. Always read-only D. Are made read only using the query to the table

Ans. a 32. Hbase creates a new version of a record during A. Creation of a record B. Modification of a record C. Deletion of a record D. All the above Ans. d 33. Which among the following are incorrect in regards with NoSQL? a. Its Easy and ready to manage with clusters. b. Suitable for upcoming data explosions. c. It requires to keep track with data structure d. Provide easy and flexible system. Ans. c 34. Which Database Administrator job was in trends with job trends? a. MongoDB b. CouchDB c. SimpleDB d. Redis Ans. a 35. No SQL Means _________________ a. Not SQL b. No Usage of SQl c. Not Only SQL d. Not for SQL Ans. c 36. A list of 5 pulse rates is: 70, 64, 80, 74, 92. What is the median for this list? a. 74 b. 76 c. 77 d. 80 Ans. a 37. Which of the following would indicate that a dataset is not bell-shaped? a. The range is equal to 5 standard deviations. b. The range is larger than the interquartile range. c. The mean is much smaller than the median. d. There are no outliers Ans. c 38. What is the effect of an outlier on the value of a correlation coefficient? a. An outlier will always decrease a correlation coefficient. b. An outlier will always increase a correlation coefficient. c. An outlier might either decrease or increase a correlation coefficient, depending on where it is in relation to the other points. d. An outlier will have no effect on a correlation coefficient. Ans. c 39. One use of a regression line is a. to determine if any x-values are outliers. b. to determine if any y-values are outliers. c. to determine if a change in x causes a change in y. d. to estimate the change in y for a one-unit change in x. Ans. d 40. Which package contains most of the basic function in R. a. Root b. Basic c. Parent

d. R

Ans. b

SET II

1. who was the developer of Hadoop language?

A. Apache Software Foundation B. Hadoop Software Foundation C. Sun Microsystems D. Bell Labs View Answer Ans : A

Explanation: Hadoop Developed by: Apache Software Foundation.

2. The hadoop language wriiten in which language?

A. C B. C++ C. Java D. Python View Answer Ans : C

Explanation: The hadoop language Written in: Java. 3. What was the Initial release date of hadoop?

A. 1st April 2007 B. 1st April 2006 C. 1st April 2008 D. 1st April 2005 View Answer Ans : B

Explanation: Initial release: April 1, 2006; 13 years ago. 4. What license is Hadoop distributed under?

A. Apache License 2.1 B. Apache License 2.2 C. Apache License 2.0 D. Apache License 1.0 View Answer Ans : C

Explanation: Hadoop is Open Source, released under Apache 2 license.

5. IBM and ________ have announced a major initiative to use Hadoop to support university courses in distributed computer programming.

A. Google B. Apple C. Facebook D. Microsoft View Answer Ans : A

Explanation: Google and IBM Announce University Initiative to Address Internet-Scale. 6. On which platfrm hadoop langauge runs?

A. Bare metal B. Debian C. Cross-platform D. Unix-Like View Answer Ans : C

Explanation: Hadoop has support for cross platform operating system.

10. Which of the following is not Features Of Hadoop?

A. Suitable for Big Data Analysis B. Scalability C. Robust D. Fault Tolerance View Answer Ans : C

Explanation: Robust is is not Features Of Hadoop.

1. The MapReduce algorithm contains two important tasks, namely __________.

A. mapped, reduce B. mapping, Reduction C. Map, Reduction D. Map, Reduce View Answer Ans : D

Explanation: The MapReduce algorithm contains two important tasks, namely Map and Reduce. 2. _____ takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs).

A. Map B. Reduce C. Both A and B D. Node View Answer Ans : A

Explanation: Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). 3. ______ task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples.

A. Map B. Reduce C. Node D. Both A and B View Answer Ans : B

Explanation: Reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples. 4. In how many stages the MapReduce program executes?

A. 2 B. 3 C. 4 D. 5 View Answer

Ans : B

Explanation: MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage. 5. Which of the following is used to schedules jobs and tracks the assign jobs to Task tracker?

A. SlaveNode B. MasterNode C. JobTracker D. Task Tracker View Answer Ans : C

Explanation: JobTracker : Schedules jobs and tracks the assign jobs to Task tracker. 6. Which of the following is used for an execution of a Mapper or a Reducer on a slice of data?

A. Task B. Job C. Mapper D. PayLoad View Answer Ans : A

Explanation: Task : An execution of a Mapper or a Reducer on a slice of data. 7. Which of the following commnd runs a DFS admin client?

A. secondaryadminnode B. nameadmin C. dfsadmin D. adminsck View Answer Ans : C

Explanation: dfsadmin : Runs a DFS admin client. 8. Point out the correct statement.

A. MapReduce tries to place the data and the compute as close as possible B. Map Task in MapReduce is performed using the Mapper() function C. Reduce Task in MapReduce is performed using the Map() function D. None of the above View Answer Ans : A

Explanation: This feature of MapReduce is "Data Locality". 9. Although the Hadoop framework is implemented in Java, MapReduce applications need not be written in ____________

A. C B. C# C. Java D. None of the above View Answer

Ans : C

Explanation: Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications (non JNITM based). 10. The number of maps is usually driven by the total size of ____________

A. Inputs B. Output C. Task D. None of the above View Answer Ans : A

Explanation: Total size of inputs means the total number of blocks of the input files. 1. What is full form of HDFS?

A. Hadoop File System B. Hadoop Field System C. Hadoop File Search D. Hadoop Field search View Answer Ans : A

Explanation: Hadoop File System was developed using distributed file system design. 2. HDFS works in a __________ fashion.

A. worker-master fashion B. master-slave fashion C. master-worker fashion D. slave-master fashion View Answer Ans : B

Explanation: HDFS follows the master-slave architecture. 3. Which of the following are the Goals of HDFS?

A. Fault detection and recovery B. Huge datasets C. Hardware at data D. All of the above View Answer Ans : D

Explanation: All the above option are the goals of HDFS. 4. ________ NameNode is used when the Primary NameNode goes down.

A. Rack B. Data C. Secondary D. Both A and B View Answer Ans : C

Explanation: Secondary namenode is used for all time availability and reliability.

5. The minimum amount of data that HDFS can read or write is called a _____________.

A. Datanode B. Namenode C. Block D. None of the above View Answer Ans : C

Explanation: The minimum amount of data that HDFS can read or write is called a Block. 6. The default block size is ______.

A. 32MB B. 64MB C. 128MB D. 16MB View Answer Ans : B

Explanation: The default block size is 64MB, but it can be increased as per the need to change in HDFS configuration. 7. For every node (Commodity hardware/System) in a cluster, there will be a _________.

A. Datanode B. Namenode C. Block D. None of the above View Answer Ans : A

Explanation: For every node (Commodity hardware/System) in a cluster, there will be a datanode. 8. Which of the following is not Features Of HDFS?

A. It is suitable for the distributed storage and processing. B. Streaming access to file system data. C. HDFS provides file permissions and authentication. D. Hadoop does not provides a command interface to interact with HDFS. View Answer Ans : D

Explanation: The correct feature is Hadoop provides a command interface to interact with HDFS. 9. HDFS is implemented in _____________ language.

A. Perl B. Python C. Java D. C View Answer Ans : C

Explanation: HDFS is implemented in Java and any computer which can run Java can host a NameNode/DataNode on it.

10. During start up, the ___________ loads the file system state from the fsimage and the edits log file.

A. Datanode B. Namenode C. Block D. ActionNode View Answer Ans : B

Explanation: HDFS is implemented on any computer which can run Java can host a NameNode/DataNode on it. 1. Which of the following is not true about Pig?

A. Apache Pig is an abstraction over MapReduce B. Pig can not perform all the data manipulation operations in Hadoop. C. Pig is a tool/platform which is used to analyze larger sets of data representing them as data flows. D. None of the above View Answer Ans : B

Explanation: Pig is generally used with Hadoop; we can perform all the data manipulation operations in Hadoop using Apache Pig. 2. Which of the following is/are a feature of Pig?

A. Rich set of operators B. Ease of programming C. Extensibility D. All of the above View Answer Ans : D

Explanation: All options are the following Features of Pig. 3. In which year apache Pig was released?

A. 2005 B. 2006 C. 2007 D. 2008 View Answer Ans : B

Explanation: In 2006, Apache Pig was developed as a research project. 4. Pig operates in mainly how many nodes?

A. 2 B. 3 C. 4 D. 5 View Answer Ans : A

Explanation: You can run Pig (execute Pig Latin statements and Pig commands) using various mode: Interactive and Batch Mode. 5. Which of the following company has developed PIG?

A. Google B. Yahoo C. Microsoft D. Apple View Answer Ans : B

Explanation: Apache Pig was developed as a research project at Yahoo, especially to create and execute MapReduce jobs on every dataset. 6. Which of the following function is used to read data in PIG?

A. Write B. Read C. Perform D. Load View Answer Ans : D

Explanation: PigStorage is the default load function. 7. __________ is a framework for collecting and storing script-level statistics for Pig Latin.

A. Pig Stats B. PStatistics C. Pig Statistics D. All of the above View Answer Ans : C

Explanation: The new Pig statistics and the existing Hadoop statistics can also be accessed via the Hadoop job history file. 8. Which of the following is true statement?

A. Pig is a high level language. B. Performing a Join operation in Apache Pig is pretty simple. C. Apache Pig is a data flow language. D. All of the above View Answer Ans : D

Explanation: All option are true statement. 9. Which of the following will compile the Pigunit?

A. $pig_trunk ant pigunit-jar B. $pig_tr ant pigunit-jar C. $pig_ ant pigunit-jar D. $pigtr_ ant pigunit-jar View Answer Ans : A

Explanation: The compile will create the pigunit.jar file.

10. Point out the wrong statement.

A. Pig can invoke code in language like Java Only B. Pig enables data workers to write complex data transformations without knowing Java C. Pig's simple SQL-like scripting language is called Pig Latin, and appeals to developers already familiar with scripting languages and SQL D. Pig is complete, so you can do all required data manipulations in Apache Hadoop with Pig View Answer Ans : A

Explanation: Through the User Defined Functions(UDF) facility in Pig, Pig can invoke code in many languages like JRuby, Jython and Java. 1. Which of the following is/are INCORRECT with respect to Hive?

A. Hive provides SQL interface to process large amount of data B. Hive needs a relational database like oracle to perform query operations and store data. C. Hive works well on all files stored in HDFS D. Both A and B View Answer Ans : B

Explanation: Hive needs a relational database like oracle to perform query operations and store data is incorrect with respect to Hive. 2. Which of the following is not a Features of HiveQL?

A. Supports joins B. Supports indexes C. Support views D. Support Transactions View Answer Ans : D

Explanation: Support Transactions is not a Features of HiveQL. 3. Which of the following operator executes a shell command from the Hive shell?

A. | B. ! C. # D. $ View Answer Ans : B

Explanation: Exclamation operator is for execution of command. 4. Hive uses _________ for logging.

A. logj4 B. log4l C. log4i D. log4j View Answer Ans : D

Explanation: By default Hive will use hive-log4j.default in the conf/ directory of the Hive installation. 5. HCatalog is installed with Hive, starting with Hive release is ___________

A. 0.10.0 B. 0.9.0 C. 0.11.0 D. 0.12.0 View Answer Ans : C

Explanation: hcat commands can be issued as hive commands, and vice versa. 6. _______ supports a new command shell Beeline that works with HiveServer2.

A. HiveServer2 B. HiveServer3 C. HiveServer4 D. HiveServer5 View Answer Ans : A

Explanation: The Beeline shell works in both embedded mode as well as remote mode. 7. The ________ allows users to read or write Avro data as Hive tables.

A. AvroSerde B. HiveSerde C. SqlSerde D. HiveQLSerde View Answer Ans : A

Explanation: AvroSerde understands compressed Avro files. 8. Which of the following data type is supported by Hive?

A. map B. record C. string D. enum View Answer Ans : D

Explanation: Hive has no concept of enums. 9. We need to store skill set of MCQs(which might have multiple values) in MCQs table, which of the following is the best way to store this information in case of Hive?

A. Create a column in MCQs table of STRUCT data type B. Create a column in MCQs table of MAP data type C. Create a column in MCQs table of ARRAY data type D. As storing multiple values in a column of MCQs itself is a violation View Answer Ans : C

Explanation: Option C is correct.

10. Letsfindcourse is generating huge amount of data. They are generating huge amount of sensor data from different courses which was unstructured in form. They moved to Hadoop framework for storing and analyzing data. What technology in Hadoop framework, they can use to analyse this unstructured data?

A. MapReduce programming B. Hive C. RDBMS D. None of the above View Answer Ans : A

Explanation: MapReduce programming is the right answer. 1. which of the following is correct statement?

A. HBase is a distributed column-oriented database B. Hbase is not open source C. Hbase is horizontally scalable. D. Both A and C View Answer Ans : D

Explanation: HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an open-source project and is horizontally scalable. 2. which of the following is not a feature of Hbase?

A. HBase is lateral scalable. B. It has automatic failure support. C. It provides consistent read and writes. D. It has easy java API for client. View Answer Ans : A

Explanation: Option A is incorrect because HBase is linearly scalable. 3. When did HBase was first released?

A. April 2007 B. March 2007 C. February 2007 D. May 2007 View Answer Ans : C

Explanation: HBase was first released in February 2007. Later in January 2008, HBase became a sub project of Apache Hadoop. 4. Apache HBase is a non-relational database modeled after Google's _________

A. BigTop B. Bigtable C. Scanner D. FoundationDB View Answer Ans : B

Explanation: Bigtable acts up on Google File System, likewise Apache HBase works on top of Hadoop and HDFS. 5. HBaseAdmin and ____________ are the two important classes in this package that provide DDL functionalities.

A. HTableDescriptor B. HDescriptor C. HTable D. HTabDescriptor View Answer Ans : A

Explanation: Java provides an Admin API to achieve DDL functionalities through programming 6. which of the following is correct statement?

A. HBase provides fast lookups for larger tables. B. It provides low latency access to single rows from billions of records C. HBase is a database built on top of the HDFS. D. All of the above View Answer Ans : D

Explanation: All the options are correct. 7. HBase supports a ____________ interface via Put and Result.

A. bytes-in/bytes-out B. bytes-in C. bytes-out D. None of the above View Answer Ans : A

Explanation: Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes. 8. Which command is used to disable all the tables matching the given regex?

A. remove all B. drop all C. disable_all D. None of the above View Answer Ans : C

Explanation: The syntax for disable_all command is as follows : hbase > disable_all 'r.*' 9. _________ is the main configuration file of HBase.

A. hbase.xml B. hbase-site.xml C. hbase-site-conf.xml D. hbase-conf.xml View Answer Ans : B

Explanation: Set the data directory to an appropriate location by opening the HBase home folder in /usr/local/HBase. 10. which of the following is incorrect statement?

A. HBase is built for wide tables B. Transactions are there in HBase. C. HBase has de-normalized data. D. HBase is good for semi-structured as well as structured data. View Answer Ans : B

Explanation: No transactions are there in HBase. 1. R was created by?

A. Ross Ihaka B. Robert Gentleman C. Both A and B D. Ross Gentleman View Answer Ans : C

Explanation: R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. 2. R allows integration with the procedures written in the?

A. C B. Ruby C. Java D. Basic View Answer Ans : A

Explanation: R allows integration with the procedures written in the C, C++, .Net, Python or FORTRAN languages for efficiency. 3. R is free software distributed under a GNU-style copy left, and an official part of the GNU project called?

A. GNU A B. GNU S C. GNU L D. GNU R View Answer Ans : B

Explanation: R is free software distributed under a GNU-style copy left, and an official part of the GNU project called GNU S. 4. R made its first appearance in?

A. 1992 B. 1995 C. 1993 D. 1994 View Answer

Ans : C

Explanation: R made its first appearance in 1993. 5. Which of the following is true about R?

A. R is a well-developed, simple and effective programming language B. R has an effective data handling and storage facility C. R provides a large, coherent and integrated collection of tools for data analysis. D. All of the above View Answer Ans : D

Explanation: All of the above statement are true. 6. Point out the wrong statement?

A. Setting up a workstation to take full advantage of the customizable features of R is a straightforward thing B. q() is used to quit the R program C. R has an inbuilt help facility similar to the man facility of UNIX D. Windows versions of R have other optional help systems also View Answer Ans : B

Explanation: help command is used for knowing details of particular command in R. 7. Command lines entered at the console are limited to about ________ bytes

A. 4095 B. 4096 C. 4097 D. 4098 View Answer Ans : A

Explanation: Elementary commands can be grouped together into one compound expression by braces (‘{’ and ‘}’). 8. R language is a dialect of which of the following languages?

A. s B. c C. sas D. matlab View Answer Ans : A

Explanation: The R language is a dialect of S which was designed in the 1980s. Since the early 90’s the life of the S language has gone down a rather winding path. The scoping rules for R are the main feature that makes it different from the original S language. 9. How many atomic vector types does R have?

A. 3 B. 4 C. 5 D. 6 View Answer

Ans : D

Explanation: R language has 6 atomic data types. They are logical, integer, real, complex, string (or character) and raw. There is also a class for “raw” objects, but they are not commonly used directly in data analysis. 10. R files has an extension _____.

A. .S B. .RP C. .R D. .SP View Answer Ans : C

Explanation: All R files have an extension .R. R provides a mechanism for recalling and reexecuting previous commands. All S programmed files will have an extension .S. But R has many functions than S. 1. What will be output for the following code?

v <- TRUE

print(class(v))

A. logical B. Numeric C. Integer D. Complex View Answer Ans : A

Explanation: It produces the following result : [1] ""logical""

2. What will be output for the following code?

v <- ""TRUE""

print(class(v))

A. logical B. Numeric C. Integer D. Character View Answer Ans : D

Explanation: It produces the following result : [1] ""character""

3. In R programming, the very basic data types are the R-objects called?

A. Lists B. Matrices

C. Vectors D. Arrays View Answer Ans : C

Explanation: In R programming, the very basic data types are the R-objects called vectors

4. Data Frames are created using the?

A. frame() function B. data.frame() function C. data() function D. frame.data() function View Answer Ans : B

Explanation: Data Frames are created using the data.frame() function 5. Which functions gives the count of levels?

A. level B. levels C. nlevels D. nlevel View Answer Ans : C

Explanation: Factors are created using the factor() function. The nlevels functions gives the count of levels. 6. Point out the correct statement?

A. Empty vectors can be created with the vector() function B. A sequence is represented as a vector but can contain objects of different classes C. "raw” objects are commonly used directly in data analysis D. The value NaN represents undefined value View Answer Ans : A

Explanation: A vector can only contain objects of the same class. 7. What will be the output of the following R code?

> x <- vector(""numeric"", length = 10)

> x

A. 1 0 B. 0 0 0 0 0 0 0 0 0 0 C. 0 1 D. 0 0 1 1 0 1 1 0 View Answer Ans : B

Explanation: You can also use the vector() function to initialize vectors.

8. What will be output for the following code?

> sqrt(-17)

A. -4.02 B. 4.02 C. 3.67 D. NAN View Answer Ans : D

Explanation: These metadata can be very useful in that they help to describe the object. 9. _______ function returns a vector of the same size as x with the elements arranged in increasing order.

A. sort() B. orderasc() C. orderby() D. sequence() View Answer Ans : A

Explanation: There are other more flexible sorting facilities available like order() or sort.list() which produce a permutation to do the sorting. 10. What will be the output of the following R code?

> m <- matrix(nrow = 2, ncol = 3)

> dim(m)

A. 3 3 B. 3 2 C. 2 3 D. 2 2 View Answer Ans : C

Explanation: Matrices are constructed column-wise. 1. Which loop executes a sequence of statements multiple times and abbreviates the code that manages the loop variable?

A. for B. while C. do-while D. repeat View Answer Ans : D

Explanation: repeat loop : Executes a sequence of statements multiple times and abbreviates the code that manages the loop variable. 2. Which of the following true about for loop?

A. Repeats a statement or group of statements while a given condition is true. It tests the condition before executing the loop body. B. it tests the condition at the end of the loop body. C. Both A and B D. None of the above View Answer Ans : B

Explanation: for loop : Like a while statement, except that it tests the condition at the end of the loop body. 3. Which statement simulates the behavior of R switch?

A. Next B. Previous C. break D. goto View Answer Ans : A

Explanation: The next statement simulates the behavior of R switch. 4. In which statement terminates the loop statement and transfers execution to the statement immediately following the loop?

A. goto B. switch C. break D. label View Answer Ans : C

Explanation: Break : Terminates the loop statement and transfers execution to the statement immediately following the loop. 5. Point out the wrong statement?

A. Multi-line expressions with curly braces are just not that easy to sort through when working on the command line B. lappy() loops over a list, iterating over each element in that list C. lapply() does not always returns a list D. You cannot use lapply() to evaluate a function multiple times each with a different argument View Answer Ans : C

Explanation: lapply() always returns a list, regardless of the class of the input. 6. The mapply() function is a multivariate apply of sorts which applies a function in parallel over a set of arguments.

A. TRUE B. FALSE C. Can be true or false D. Can not say View Answer Ans : A

Explanation: True, The mapply() function is a multivariate apply of sorts which applies a function in parallel over a set of arguments. 7. Which of the following is valid body of split function?

A. function (x, f) B. function (x, f, drop = FALSE, …) C. function (x, drop = FALSE, …) D. function (drop = FALSE, …) View Answer Ans : B

Explanation: x is a vector (or list) or data frame 8. Which of the following character skip during execution?

v <- LETTERS[1:6]

for ( i in v) {

if (i == ""D"") {

next

}

print(i)

}

A. A B. B C. C D. D View Answer Ans : D

Explanation: When the above code is compiled and executed, it produces the following result : [1] ""A"" [1] ""B"" [1] ""C"" [1] ""E"" [1] ""F""

9. What will be output for the following code?

v <- LETTERS[1]

for ( i in v) {

print(v)

}

A. A B. A B C. A B C D. A B C D View Answer Ans : A

Explanation: The output for the following code : [1] ""A"" 10. What will be output for the following code?

v <- LETTERS[""A""]

for ( i in v) {

print(v)

}

A. A B. NAN C. NA D. Error View Answer Ans : C

Explanation: The output for the following code : [1] NA 1. An R function is created by using the keyword?

A. fun B. function C. declare D. extends View Answer Ans : B

Explanation: An R function is created by using the keyword function. 2. What will be output for the following code?

print(mean(25:82))

A. 1526 B. 53.5 C. 50.5 D. 55 View Answer Ans : B

Explanation: The code will find mean of numbers from 25 to 82 that is 53.5 3. Point out the wrong statement?

A. Functions in R are “second class objects” B. The writing of a function allows a developer to create an interface to the code, that is explicitly specified with a set of parameters

C. Functions provides an abstraction of the code to potential users D. Writing functions is a core activity of an R programmer View Answer Ans : A

Explanation: Functions in R are “first class objects”, which means that they can be treated much like any other R object. 4. What will be output for the following code?

> paste("a", "b", se = ":")

A. a+b B. a:b C. a-b D. None of the above View Answer Ans : D

Explanation: With the paste() function, the arguments sep and collapse must be named explicitly and in full if the default values are not going to be used. 5. Which function in R language is used to find out whether the means of 2 groups are equal to each other or not?

A. f.tests () B. l.tests () C. t.tests () D. p.tests () View Answer Ans : C

Explanation: t.tests () function in R language is used to find out whether the means of 2 groups are equal to each other. It is not used most commonly in R. It is used in some specific conditions. 6. What will be the output of log (-5.8) when executed on R console?

A. NA B. NAN C. 0.213 D. Error View Answer Ans : B

Explanation: Executing the above on R console or terminal will display a warning sign that NaN (Not a Number) will be produced in R console because it is not possible to take a log of a negative number(-). 7. Which function is preferred over sapply as vapply allows the programmer to specific the output type?

A. Lapply B. Japply C. Vapply D. Zapply View Answer

Ans : C

Explanation: Vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use. simplify2array() is the utility called from sapply() when simplify is not false and is similarly called from mapply(). 8. How will you check if an element is present in a vector?

A. Match() B. Dismatch() C. Mismatch() D. Search() View Answer Ans : A

Explanation: It can be done using the match () function- match () function returns the first appearance of a particular element. The other way is to use %in% which returns a Boolean value either true or false. 9. You can check to see whether an R object is NULL with the _________ function.

A. is.null() B. is.nullobj() C. null() D. as.nullobj() View Answer Ans : A

Explanation: It is sometimes useful to allow an argument to take the NULL value, which might indicate that the function should take some specific action. 10. In the base graphics system, which function is used to add elements to a plot?

A. Boxplot() B. Text() C. Treat() D. Both A and B View Answer Ans : D

Explanation: In the base graphics system, boxplot or text function is used to add elements to a plot. 1. Which of the following syntax is used to install forecast package?

A. install.pack("forecast") B. install.packages("cast") C. install.packages("forecast") D. install.pack("forecastcast") View Answer Ans : C

Explanation: forecast is used for time series analysis 2. Which splits a data frame and returns a data frame?

A. apply B. ddply

C. stats D. plyr View Answer Ans : B

Explanation: ddply splits a data frame and returns a data frame. 3. Which of the following is an R package for the exploratory analysis of genetic and genomic data?

A. adeg B. adegenet C. anc D. abd View Answer Ans : B

Explanation: This package contains Classes and functions for genetic data analysis within the multivariate framework. 4. Which of the following contains functions for processing uniaxial minute-to-minute accelerometer data?

A. accelerometry B. abc C. abd D. anc View Answer Ans : A

Explanation: This package contains a collection of functions that perform operations on timeseries accelerometer data, such as identify non-wear time, flag minutes that are part of an activity about, and find the maximum 10-minute average count value. 5. ______ Uses Grieg-Smith method on 2 dimensional spatial data.

A. G.A. B. G2db C. G.S. D. G1DBN View Answer Ans : C

Explanation: The function returns a GriegSmith object which is a matrix with block sizes, sum of squares for each block size as well as mean sums of squares. G1DBN is a package performing Dynamic Bayesian Network Inference. 6. Which of the following package provide namespace management functions not yet present in base R?

A. stringr B. nbpMatching C. messagewarning D. namespace View Answer Ans : D

Explanation: The package namespace is one of the most confusing parts of building a package. nbpMatching contains functions for non-bipartite optimal matching. 7. What will be the output of the following R code?

install.packages(c("devtools", "roxygen2"))

A. Develops the tools B. Installs the given packages C. Exits R studio D. Nothing happens View Answer Ans : B

Explanation: Make sure you have the latest version of R and then run the above code to get the packages you’ll need. It installs the given packages. Confirm that you have a recent version of RStudio. 8. A bundled package is a package that’s been compressed into a ______ file.

A. Double B. Single C. Triple D. No File View Answer Ans : B

Explanation: A bundled package is a package that’s been compressed into a single file. A source package is just a directory with components like R/, DESCRIPTION, and so on. 9. .library() is not useful when developing a package since you have to install the package first.

A. TRUE B. FALSE C. Can be true or false D. Can not say View Answer Ans : A

Explanation: library() is not useful when developing a package since you have to install the package first. A library is a simple directory containing installed packages.

10. DESCRIPTION uses a very simple file format called DCF.

A. TRUE B. FALSE C. Can be true or false D. Can not say View Answer Ans : A

Explanation: DESCRIPTION uses a very simple file format called DCF, the Debian control format. When you first start writing packages, you’ll mostly use these metadata to record what packages are needed to run your package.

19. HDFS Stores how much data in each clusters that can be scaled at any time? 1. 32 2. 64 3. 128 4. 256 Show Answer 128

33. _____ provides performance through distribution of data and fault tolerance through replication 1. HDFS 2. PIG 3. HIVE 4. HADOOP Show Answer HDFS

34. ______ is a programming model for writing applications that can process Big Data in parallel on multiple nodes. 1. HDFS 2. MAP REDUCE 3. HADOOP 4. HIVE Show Answer MAP REDUCE

35. ____ takes the grouped key-value paired data as input and runs a Reducer function on each one of them. 1. MAPPER 2. REDUCER 3. COMBINER 4. PARTITIONER Show Answer REDUCER

36. ____ is a type of local Reducer that groups similar data from the map phase into identifiable sets. 1. MAPPER 2. REDUCER 3. COMBINER 4. PARTITIONER Show Answer COMBINER

37. While Installing Hadoop how many xml files are edited and list them ? 1. core-site.xml 2. hdfs-site.xml 3. mapred.xml 4. yarn.xml Show Answer core-site.xml

This set of Object Oriented Programming using C++ Assessment Questions and Answers focuses on “Pointer to Objects”. 1. Which language among the following doesn’t allow pointers? a) C++ b) Java c) Pascal d) C Answer: b Explanation: The concept of pointers is not supported in Java. The feature is not given in the language but can be used in some ways explicitly. Though this pointer is supported by java too. 2. Which is correct syntax for declaring pointer to object? a) className* objectName; b) className objectName; c) *className objectName; d) className objectName(); Answer: a Explanation: The syntax must contain * symbol after the className as the type of object. This declares an object pointer. This can store address of any object of the specified class. 3. Which operator should be used to access the members of the class using object pointer? a) Dot operator b) Colon to the member c) Scope resolution operator d) Arrow operator Answer: d Explanation: The members can be accessed from the object pointer by using arrow operator. The arrow operator can be used only with the pointer of class type. If simple object is declared, it must use dot operator to access the members. 4. How does compiler decide the intended object to be used, if more than one object are used? a) Using object name b) Using an integer pointer c) Using this pointer d) Using void pointer Answer: c Explanation: This pointer denotes the object, in which it is being used. If member function is called with respect to one object then this pointer refers to the same object members. It can be used when members with same name are involved. 5. If pointer to an object is declared __________ a) It can store any type of address b) It can store only void addresses c) It can only store address of integer type d) It can only store object address of class type specified Answer: d Explanation: The address of only the specified class type can get their address stored in the object pointer. The addresses doesn’t differ but they do differ for the amount and type of memory required for objects of different classes. Hence same class object pointer should be used. 6. What is the size of an object pointer? a) Equal to size of any usual pointer b) Equal to size of sum of all the members of object c) Equal to size of maximum sized member of object d) Equal to size of void Answer: a Explanation: The size of object pointer is same as that of any usual pointer. This is because only the address have to be stored. There are no values to be stored in the pointer. 7. A pointer _________________ a) Can point to only one object at a time b) Can point to more than one objects at a time c) Can point to only 2 objects at a time d) Can point to whole class objects at a time Answer: a Explanation: The object pointer can point to only one object at a time. The pointer will be able to store only one address at a

time. Hence only one object can be referred. 8. Pointer to a base class can be initialized with the address of derived class, because of _________ a) derived-to-base implicit conversion for pointers b) base-to-derived implicit conversion for pointers c) base-to-base implicit conversion for pointers d) derived-to-derived implicit conversion for pointers Answer: a Explanation: It is an implicit rule defined in most of the programming languages. It permits the programmer to declare a pointer to the derived class from a base class pointer. In this way the programmer doesn’t have to declare object for derived class each time it is required. 9. Can pointers to object access the private members of the class? a) Yes, always b) Yes, only if it is only pointer to object c) No, because objects can be referenced from another objects too d) No, never Answer: d Explanation: The pointers to an object can never access the private members of the class outside the class. The object can indirectly use those private members using member functions which are public in the class. 10. Is name of an array of objects is also a pointer to object? a) Yes, always b) Yes, in few cases c) No, because it represents more than one object d) No, never Answer: a Explanation: The array name represents a pointer to the object. The name alone can represent the starting address of the array. But that also represents an array which is in turn stored in a pointer. 11. Which among the following is true? a) The pointer to object can hold address only b) The pointer can hold value of any type c) The pointer can hold only void reference d) The pointer can’t hold any value Answer: a Explanation: The pointer to an object can hold only the addresses. Address of any other object of same class. This allows the programmer to link more than one objects if required. 12. Which is the correct syntax to call a member function using pointer? a) pointer->function() b) pointer.function() c) pointer::function() d) pointer:function() Answer: a Explanation: The pointer should be mentioned followed by the arrow operator. Arrow operator is applicable only with the pointers. Then the function name should be mentioned that is to be called. 13. If a pointer to an object is created and the object gets deleted without using the pointer then __________ a) It becomes void pointer b) It becomes dangling pointer c) It becomes null pointer d) It becomes zero pointer Answer: b Explanation: When the address pointed by the object pointer gets deleted, the pointer now points to an invalid address. Hence it becomes a dangling pointer. It can’t be null or void pointer since it doesn’t point to any specific location. 14. How can the address stored in the pointer be retrieved? a) Using * symbol b) Using $ symbol c) Using & symbol d) Using @ symbol Answer: c Explanation: The & symbol must be used. This should be done such that the object should be preceded by & symbol and then

the address should be stored in another variable. This is done to get the address where the object is stored. 15. What should be done to prevent changes that may be made to the values pointed by the pointer? a) Usual pointer can’t change the values pointed b) Pointer should be made virtual c) Pointer should be made anonymous d) Pointer should be made const Answer: d Explanation: The pointer should be declared as a const type. This prevents the pointer to change any value that is being pointed from it. This is a feature that is made to access the values using pointer but to make sure that pointer doesn’t change those values accidently. 16. References to object are same as pointers of object. a) True b) False Answer: b Explanation: The references are made to object when the object is created and initialized with another object without calling any constructor. But the object pointer must be declared explicitly using * symbol that will be capable of storing some address. Hence both are different.

This set of Basic Object Oriented Programming using C++ Questions and Answers focuses on “Copy Constructor”. 1. Copy constructor is a constructor which ________________ a) Creates an object by copying values from any other object of same class b) Creates an object by copying values from first object created for that class c) Creates an object by copying values from another object of another class d) Creates an object by initializing it with another previously created object of same class Answer: d Explanation: The object that has to be copied to new object must be previously created. The new object gets initialized with the same values as that of the object mentioned for being copied. The exact copy is made with values. 2. The copy constructor can be used to ____________ a) Initialize one object from another object of same type b) Initialize one object from another object of different type c) Initialize more than one object from another object of same type at a time d) Initialize all the objects of a class to another object of another class Answer: a Explanation: The copy constructor has the most basic function to initialize the members of an object with same values as that of some previously created object. The object must be of same class. 3. If two classes have exactly same data members and member function and only they differ by class name. Can copy constructor be used to initialize one class object with another class object? a) Yes, possible b) Yes, because the members are same c) No, not possible d) No, but possible if constructor is also same Answer: c Explanation: The restriction for copy constructor is that it must be used with the object of same class. Even if the classes are exactly same the constructor won’t be able to access all the members of another class. Hence we can’t use object of another class for initialization. 4. The copy constructors can be used to ________ a) Copy an object so that it can be passed to a class b) Copy an object so that it can be passed to a function c) Copy an object so that it can be passed to another primitive type variable d) Copy an object for type casting Answer: b Explanation: When an object is passed to a function, actually its copy is made in the function. To copy the values, copy constructor is used. Hence the object being passed and object being used in function are different. 5. Which returning an object, we can use ____________ a) Default constructor b) Zero argument constructor c) Parameterized constructor d) Copy constructor Answer: d Explanation: While returning an object we can use the copy constructor. When we assign the return value to another object of same class then this copy constructor will be used. And all the members will be assigned the same values as that of the object being returned. 6. If programmer doesn’t define any copy constructor then _____________ a) Compiler provides an implicit copy constructor b) Compiler gives an error c) The objects can’t be assigned with another objects d) The program gives run time error if copying is used Answer: a Explanation: The compiler provides an implicit copy constructor. It is not mandatory to always create an explicit copy constructor. The values are copied using implicit constructor only. 7. If a class implements some dynamic memory allocations and pointers then _____________ a) Copy constructor must be defined b) Copy constructor must not be defined c) Copy constructor can’t be defined d) Copy constructor will not be used

Answer: a Explanation: In the case where dynamic memory allocation is used, the copy constructor definition must be given. The implicit copy constructor is not capable of manipulating the dynamic memory and pointers. Explicit definition allows to manipulate the data as required. 8. What is the syntax of copy constructor? a) classname (classname &obj){ /*constructor definition*/ } b) classname (cont classname obj){ /*constructor definition*/ } c) classname (cont classname &obj){ /*constructor definition*/ } d) classname (cont &obj){ /*constructor definition*/ } Answer: c Explanation: The syntax must contain the class name first, followed by the classname as type and &object within parenthesis. Then comes the constructor body. The definition can be given as per requirements. 9. Object being passed to a copy constructor ___________ a) Must be passed by reference b) Must be passed by value c) Must be passed with integer type d) Must not be mentioned in parameter list Answer: a Explanation: This is mandatory to pass the object by reference. Otherwise, the object will try to create another object to copy its values, in turn a constructor will be called, and this will keep on calling itself. This will cause the compiler to give out of memory error. 10. Out of memory error is given when the object _____________ to the copy constructor. a) Is passed with & symbol b) Is passed by reference c) Is passed as <classname &obj> d) Is not passed by reference Answer: d Explanation: All the options given, directly or indirectly indicate that the object is being passed by reference. And if object is not passed by reference then the out of memory error is produced. Due to infinite constructor call of itself. 11. Copy constructor will be called whenever the compiler __________ a) Generates implicit code b) Generates member function calls c) Generates temporary object d) Generates object operations Answer: c Explanation: Whenever the compiler creates a temporary object, copy constructor is used to copy the values from existing object to the temporary object. 12. The deep copy is possible only with the help of __________ a) Implicit copy constructor b) User defined copy constructor c) Parameterized constructor d) Default constructor Answer: b Explanation: While using explicit copy constructor, the pointers of copied object point to the intended memory location. This is assured since the programmers themselves manipulate the addresses. 13. Can a copy constructor be made private? a) Yes, always b) Yes, if no other constructor is defined c) No, never d) No, private members can’t be accessed Answer: a Explanation: The copy constructor can be defined as private. If we make it private then the objects of the class can’t be copied. It can be used when a class used dynamic memory allocation. 14. The arguments to a copy constructor _____________ a) Must be const b) Must not be cosnt c) Must be integer type

d) Must be static Answer: a Explanation: The object should not be modified in the copy constructor. Because the object itself is being copied. When the object is returned from a function, the object must be a constant otherwise the compiler creates a temporary object which can die anytime. 15. Copy constructors are overloaded constructors. a) True b) False Answer: a Explanation: The copy constructors are always overloaded constructors. They have to be. All the classes have a default constructor and other constructors are basically overloaded constructors. To practice basic questions and answers on all areas of Object Oriented Programming using C++, .

This set of Object Oriented Programming using C++ Interview Questions and Answers for Experienced people focuses on “Passing Object to Functions”. 1. Passing object to a function _______________ a) Can be done only in one way b) Can be done in more than one ways c) Is not possible d) Is not possible in OOP Answer: b Explanation: The objects can be passed to the functions and this requires OOP concept because objects are main part of OOP. The objects can be passed in more than one way to a function. The passing depends on how the object have to be used. 2. The object ________________ a) Can be passed by reference b) Can be passed by value c) Can be passed by reference or value d) Can be passed with reference Answer: c Explanation: The objects can be passed by reference if required to use the same object. The values can be passed so that the main object remains same and no changes are made to it if the function makes any changes to the values being passed. 3. Which symbol should be used to pass the object by reference in C++? a) & b) @ c) $ d) $ or & Answer: a Explanation: The object to be passed by reference to the function should be preceded by & symbol in the argument list syntax of the function. This indicates the compiler not to use new object. The same object which is being passed have to be used. 4. If object is passed by value ______________ a) Copy constructor is used to copy the values into another object in the function b) Copy constructor is used to copy the values into temporary object c) Reference to the object is used to access the values of the object d) Reference to the object is used to created new object in its place Answer: a Explanation: The copy constructor is used. This constructor is used to copy the values into a new object which will contain all the values same as that of the object being passed but any changes made to the newly created object will not affect the original object. 5. Pass by reference of an object to a function _______________ a) Affects the object in called function only b) Affects the object in prototype only c) Affects the object in caller function d) Affects the object only if mentioned with & symbol with every call Answer: c Explanation: The original object in the caller function will get affected. The changes made in the called function will be same in the caller function object also. 6. Copy constructor definition requires __________________ a) Object to be passed by value b) Object not to be passed to it c) Object to be passed by reference d) Object to be passed with each data member value Answer: c Explanation: The object must be passed by reference to a copy constructor. This is to avoid the out of memory error. The constructors keeps calling itself, if not passed by reference, and goes out of memory. 7. What is the type of object that should be specified in the argument list? a) Function name b) Object name itself c) Caller function name d) Class name of object Answer: d

Explanation: The type of object is the class itself. The class name have to be specified in order to pass the objects to a function. This allows the program to create another object of same class or to use the same object that was passed. 8. If an object is passed by value, _________________ a) Temporary object is used in the function b) Local object in the function is used c) Only the data member values are used d) The values are accessible from the original object Answer: b Explanation: When an object is called by values, copy constructor is called and object is copied to the local object of the function which is mentioned in the argument list. The values gets copied and are used from the local object. There is no need to access the original object again. 9. Can data members be passed to a function using the object? a) Yes, it can be passed only inside class functions b) Yes, only if the data members are public and are being passed to a function outside the class c) No, can’t be passed outside the class d) No, can’t be done Answer: b Explanation: The data members can be passed with help of object but only if the member is public. The object will obviously be used outside the class. The object must have access to the data member so that its value or reference is used outside the class which is possible only if the member is public. 10. What exactly is passed when an object is passed by reference? a) The original object name b) The original object class name c) The exact address of the object in memory d) The exact address of data members Answer: c Explanation: The location of the object, that is, the exact memory location is passed, when the object is passed by reference. The pass by reference is actually a reference to the object that the function uses with another name to the same memory location as the original object uses. 11. If the object is not to be passed to any function but the values of the object have to be used then? a) The data members should be passed separately b) The data members and member functions have to be passed separately c) The values should be present in other variables d) The object must be passed Answer: a Explanation: The data members can be passed separately. There is no need to pass whole object, instead we can use the object to pass only the required values. 12. Which among the following is true? a) More than one object can’t be passed to a function b) Any number of objects can be passed to a function c) Objects can’t be passed, only data member values can be passed d) Objects should be passed only if those are public in class Answer: b Explanation: There is no restriction on passing the number of objects to a function. The operating system or the compiler or environment may limit the number of arguments. But there is no limit on number of objects till that limit. 13. What will be the output if all necessary code is included (Header files and main function)? void test (Object &y) { y = "It is a string"; } void main() { Object x = null; test (x); System.out.println (x); } a) Run time error

b) Compile time error c) Null d) It is a string Answer: d Explanation: This is because the x object is passed by reference. The changes made inside the function will be applicable to original function too. 14. In which type is new memory location will be allocated? a) Only in pass by reference b) Only in pass by value c) Both in pass by reference and value d) Depends on the code Answer: b Explanation: The new memory location will be allocated only if the object is passed by value. Reference uses the same memory address and is denoted by another name also. But in pass by value, another object is created and new memory space is allocated for it. 15. Pass by reference and pass by value can’t be done simultaneously in a single function argument list. a) True b) False Answer: b Explanation: There is no condition which specifies that only the reference pass or values pass is allowed. The argument list can contain one reference pass and another value pass. This helps to manipulate the objects with functions more easily.

This set of Object Oriented Programming using C++ Interview Questions and Answers for freshers focuses on “Overriding Member Functions”. 1. Which among the following best describes member function overriding? a) Member functions having same name in base and derived classes b) Member functions having same name in base class only c) Member functions having same name in derived class only d) Member functions having same name and different signature inside main function Answer: a Explanation: The member function which is defined in base class and again in the derived class, is overridden by the definition given in the derived class. This is because the preference is given more to the local members. When derived class object calls that function, definition from the derived class is used. 2. Which among the following is true? a) Inheritance must not be using when overriding is used b) Overriding can be implemented without using inheritance c) Inheritance must be done, to use overriding are overridden d) Inheritance is mandatory only if more than one functions Answer: c Explanation: The inheritance must be used in order to use function overriding. If inheritance is not used, the functions can only be overloaded. There must be a base class and a derived class to override the function of base class. 3. Which is the correct condition for function overriding? a) The declaration must not be same in base and derived class b) The declaration must be exactly the same in base and derived class c) The declaration should have at least 1 same argument in declaration of base and derived class d) The declaration should have at least 1 different argument in declaration of base and derived class Answer: b Explanation: For a function to be over ridden, the declaration must be exactly the same. There must not be any different syntax used. This will ensure that the function to be overridden is only the one intended from to be overridden from the derived class. 4. Exactly same declaration in base and derived class includes______________ a) Only same name b) Only same return type and name c) Only same return type and argument list d) All the same return type, name and parameter list Answer: d Explanation: Declaration includes the whole prototype of the function. The return type name and the parameter list must be same in order to confirm that the function is same in derived and the base class. And hence can be overridden. 5. Which among function will be overridden from the function defined in derived class below: class A { int i; void show() { cout<<i; } void print() { cout <<i; } }; class B { int j; void show() { cout<<j; } }; a) show() b) print() c) show() and print()

d) Compile time error Answer: a Explanation: The declaration must be exactly same in the derived class and base class. The derived class have defined show() function with exactly same declaration. This then shows that the function in base class is being overridden if show() is called from the object of class B. 6. How to access the overridden method of base class from the derived class? a) Using arrow operator b) Using dot operator c) Using scope resolution operator d) Can’t be accessed once overridden Answer: c Explanation: Scope resolution operator :: can be used to access the base class method even if overridden. To access those, first base class name should be written followed by the scope resolution operator and then the method name. 7. The functions to be overridden _____________ a) Must be private in base class b) Must not be private base class c) Must be private in both derived and base class d) Must not be private in both derived and base class Answer: b Explanation: If the function is private in the base class, derived class won’t be able to access it. When the derived class can’t access the function to be overridden then it won’t be able to override it with any definition. 8. Which language doesn’t support the method overriding implicitly? a) C++ b) C# c) Java d) SmallTalk Answer: b Explanation: The feature of method overriding is not provided in C#. To override the methods, one must use override or virtual keywords explicitly. This is done to remove accidental changes in program and unintentional overriding. 9. In C# ____________________ a) Non – virtual or static methods can’t be overridden b) Non – virtual and static methods only can be overridden c) Overriding is not allowed d) Overriding must be implemented using C++ code only Answer: a Explanation: The non-virtual and static methods can’t be overridden in C# language. The restriction is made from the language implicitly. Only the methods that are abstract, virtual or override can be overridden. 10. In Delphi ______________ a) Method overriding is done implicitly b) Method overriding is not supported c) Method overriding is done with directive override d) Method overriding is done with the directive virtually Answer: c Explanation: This is possible but only if the method to be overridden is marked as dynamic or virtual. It is inbuilt restriction of programming language. This is done to reduce the accidental or unintentional overriding. 11. What should be used to call the base class method from the derived class if function overriding is used in Java? a) Keyword super b) Scope resolution c) Dot operator d) Function name in parenthesis Answer: a Explanation: The keyword super must be used to access base class members. Even when overriding is used, super must be used with the dot operator. The overriding is possible. 12. In Kotlin, the function to be overridden must be ______________ a) Private b) Open c) Closed

d) Abstract Answer: b Explanation: The function to be overridden must be open. This is a condition in Kotlin for any function to be overridden. This avoids accidental overriding. 13. Abstract functions of a base class _________________ a) Are overridden by the definition in same class b) Are overridden by the definition in parent class c) Are not overridden generally d) Are overridden by the definition in derived class Answer: d Explanation: The functions declared to be abstract in base class are redefined in derived classes. That is, the functions are overridden by the definitions given in the derived classes. This must be done to give at least one definition to each undefined function. 14. If virtual functions are defined in the base class then _______________ a) It is not necessary for derived classes to override those functions b) It is necessary for derived classes to override those functions c) Those functions can never be derived d) Those functions must be overridden by all the derived classes Answer: a Explanation: The derived classes doesn’t have to redefine and override the base class functions. If one definition is already given it is not mandatory for any derived class to override those functions. The base class definition will be used. 15. Which feature of OOP is exhibited by the function overriding? a) Inheritance b) Abstraction c) Polymorphism d) Encapsulation Answer: c Explanation: The polymorphism feature is exhibited by function overriding. Polymorphism is the feature which basically defines that same named functions can have more than one functionalities.

This set of Object Oriented Programming using C++ Interview Questions and Answers focuses on “Passing and Returning Object with Functions”. 1. In how many ways can an object be passed to a function? a) 1 b) 2 c) 3 d) 4 Answer: c Explanation: The objects can be passed in three ways. Pass by value, pass by reference and pass by address. These are the general ways to pass the objects to a function. 2. If an object is passed by value _____________ a) A new copy of object is created implicitly b) The object itself is used c) Address of the object is passed d) A new object is created with new random values Answer: a Explanation: When an object is passed by value, a new object is created implicitly. This new object uses the implicit values assignment, same as that of the object being passed. 3. Pass by address passes the address of object _________ and pass by reference passes the address of the object _________ a) Explicitly, explicitly b) Implicitly, implicitly c) Explicitly, Implicitly d) Implicitly, explicitly Answer: c Explanation: Pass by address uses the explicit address passing to the function whereas pass by reference implicitly passes the address of the object. 4. If an object is passed by reference, the changes made in the function ___________ a) Are reflected to the main object of caller function too b) Are reflected only in local scope of the called function c) Are reflected to the copy of the object that is made during pass d) Are reflected to caller function object and called function object also Answer: a Explanation: When an object is passed by reference, its address is passed implicitly. This will make changes to the main function whenever any modification is done. 5. Constructor function is not called when an object is passed to a function, will its destructor be called when its copy is destroyed? a) Yes, depending on code b) Yes, must be called c) No, since no constructor was called d) No, since same object gets used Answer: b Explanation: Even though the constructor is not called when the object is passed to a function, the copy of the object is still created, where the values of the members are same. When the object have to be destroyed, the destructor is called to free the memory and resources that the object might have reserved. 6. When an object is returned by a function, a _______________ is automatically created to hold the return value. a) Temporary object b) Virtual object c) New object d) Data member Answer: a Explanation: The temporary object is created. It holds the return value. The values gets assigned as required, and the temporary object gets destroyed. 7. Is the destruction of temporary object safe (while returning object)? a) Yes, the resources get free to use b) Yes, other objects can use the memory space c) No, unexpected side effects may occur

d) No, always gives rise to exceptions Answer: c Explanation: The destruction of temporary variable may give rise to unexpected logical errors. Consider the destructor which may free the dynamically allocated memory. But this may abort the program if another is still trying to copy the values from that dynamic memory. 8. How to overcome the problem arising due to destruction of temporary object? a) Overloading insertion operator b) Overriding functions can be used c) Overloading parenthesis or returning object d) Overloading assignment operator and defining copy constructor Answer: d Explanation: The problem can be solved by overloading the assignment operator to get the values that might be getting returned while the destructor free the dynamic memory. Defining copy constructor can help us to do this in even simpler way. 9. How many objects can be returned at once? a) Only 1 b) Only 2 c) Only 16 d) As many as required Answer: a Explanation: Like any other value, only one object can be returned at ones. The only possible way to return more than one object is to return address of an object array. But that again comes under returning object pointer. 10. What will be the output of the following code? Class A { int i; public : A(int n) { i=n; cout<<”inside constructor ”; } ~A() { cout<<”destroying ”<<i; } void seti(int n) { i=n; } int geti() { return I; } }; void t(A ob) { cout<<”something ”; } int main() { A a(1); t(a); cout<<”this is i in main ”; cout<<a.geti(); } a) inside constructor something destroying 2this is i in main destroying 1 b) inside constructor something this is i in main destroying 1 c) inside constructor something destroying 2this is i in main d) something destroying 2this is i in main destroying 1 Answer: a Explanation: Although the object constructor is called only ones, the destructor will be called twice, because of destroying the copy of the object that is temporarily created. This is the concept of how the object should be passed and manipulated.

11. It is necessary to return the object if it was passed by reference to a function. a) Yes, since the object must be same in caller function b) Yes, since the caller function needs to reflect the changes c) No, the changes are made automatically d) No, the changes are made explicitly Answer: c Explanation: Having the address being passed to the function, the changes are automatically made to the main function. In all the cases if the address is being used, the same memory location will be updated with new values. 12. How many objects can be passed to a function simultaneously? a) Only 1 b) Only an array c) Only 1 or an array d) As many as required Answer: d Explanation: There is no limit to how many objects can be passed. This works in same way as that any other variable gets passed. Array and object can be passed at same time also. 13. If an object is passed by address, will be constructor be called? a) Yes, to allocate the memory b) Yes, to initialize the members c) No, values are copied d) No, temporary object is created Answer: c Explanation: A copy of all the values is created. If the constructor is called, there will be a compile time error or memory shortage. This happens because each time a constructor is called, it try to call itself again and that goes infinite times. 14. Is it possible that an object of is passed to a function, and the function also have an object of same name? a) No, Duplicate declaration is not allowed b) No, 2 objects will be created c) Yes, Scopes are different d) Yes, life span is different Answer: a Explanation: There can’t be more than one variable or object with the same name in same scope. The scope is same, since the object is passed, it becomes local to function and hence function can’t have one more object of same name. 15. Passing an object using copy constructor and pass by value are same. a) True b) False Answer: b Explanation: The copy constructor is used to copy the values from one object to other. Pass by values is not same as copy constructor method. Actually the pass by value method uses a copy constructor to copy the values in a local object.

This set of Object Oriented Programming using C++ Multiple Choice Questions & Answers focuses on “Private Member Functions”. 1. Which is private member functions access scope? a) Member functions which can only be used within the class b) Member functions which can used outside the class c) Member functions which are accessible in derived class d) Member functions which can’t be accessed inside the class Answer: a Explanation: The member functions can be accessed inside the class only if they are private. The access is scope is limited to ensure the security of the private members and their usage. 2. Which among the following is true? a) The private members can’t be accessed by public members of the class b) The private members can be accessed by public members of the class c) The private members can be accessed only by the private members of the class d) The private members can’t be accessed by the protected members of the class Answer: b Explanation: The private members are accessible within the class. There is no restriction on use of private members by public or protected members. All the members can access the private member functions of the class. 3. Which member can never be accessed by inherited classes? a) Private member function b) Public member function c) Protected member function d) All can be accessed Answer: a Explanation: The private member functions can never be accessed in the derived classes. The access specifiers is of maximum security that allows only the members of self class to access the private member functions. 4. Which syntax among the following shows that a member is private in a class? a) private: functionName(parameters) b) private(functionName(parameters)) c) private functionName(parameters) d) private::functionName(parameters) Answer: c Explanation: The function declaration must contain private keyword follower by the return type and function name. Private keyword is followed by normal function declaration. 5. If private member functions are to be declared in C++ then _____________ a) private: <all private members> b) private <member name> c) private(private member list) d) private :- <private members> Answer: a Explanation: The private members doesn’t have to have the keyword with each private member. We only have to specify the keyword private followed by single colon and then private member’s are listed. 6. In java, which rule must be followed? a) Keyword private preceding list of private member’s b) Keyword private with a colon before list of private member’s c) Keyword private with arrow before each private member d) Keyword private preceding each private member Answer: d Explanation: The private keyword must be mentioned before each private member. Unlike the rule in C++ to specify private once and list all other private member’s, in java all member declarations must be preceded by the keyword private. 7. How many private member functions are allowed in a class? a) Only 1 b) Only 7 c) Only 255 d) As many as required Answer: d Explanation: There are no conditions applied on the number of private member functions that can be declared in a class. Though

the system may restrict use of too many functions depending on memory. 8. How to access a private member function of a class? a) Using object of class b) Using object pointer c) Using address of member function d) Using class address Answer: c Explanation: Even the private member functions can be called outside the class. This is possible if address of the function is known. We can use the address to call the function outside the class. 9. Private member functions ____________ a) Can’t be called from enclosing class b) Can be accessed from enclosing class c) Can be accessed only if nested class is private d) Can be accessed only if nested class is public Answer: a Explanation: The nested class members can’t be accessed in the enclosed class even though other members can be accessed. This is to ensure the class members security and not to go against the rules of private members. 10. Which function among the following can’t be accessed outside the class in java in same package? a) public void show() b) void show() c) protected show() d) static void show() Answer: c Explanation: The protected members are available within the class. And are also available in derived classes. But these members are treated as private members for outside the class and inheritance structure. Hence can’t be accessed. 11. If private members are to be called outside the class, which is a good alternative? a) Call a public member function which calls private function b) Call a private member function which calls private function c) Call a protected member function which calls private function d) Not possible Answer: a Explanation: The private member functions can be accessed within the class. A public member function can be called which in turn calls the private member function. This maintains the security and adheres to the rules of private members. 12. A private function of a derived class can be accessed by the parent class. a) True b) False Answer: b Explanation: If private functions get accessed even by the parent class that will violate the rules of private members. If the functions can be accessed then the derived class security is hindered. 13. Which error will be produced if private members are accessed? a) Can’t access private message b) Code unreachable c) Core dumped d) Bad code Answer: a Explanation: The private members access from outside the class produce an error. The error states that the code at some line can’t access the private members. And denies the access terminating the program. 14. Can main() function be made private? a) Yes, always b) Yes, if program doesn’t contain any classes c) No, because main function is user defined d) No, never Answer: d Explanation: The reason given in option “No, because main function is user defined” is wrong. The proper reason that the main function should not be private is that it should be accessible in whole program. This makes the program flexible. 15. If a function in java is declared private then it __________________

a) Can’t access the standard output b) Can access the standard output c) Can’t access any output stream d) Can access only the output streams Answer: b Explanation: The private members can access any standard input or output. There is no restriction on access to any input or output stream. And since standard input can also be used hence only accessing the output stream is not true. To practice all areas of Object Oriented Programming using C++, .

This set of Object Oriented Programming using C++ Problems focuses on “Types of Member Functions”. 1. How many types of member functions are possible in general? a) 2 b) 3 c) 4 d) 5 Answer: d Explanation: There are basically 5 types of member functions possible. The types include simple, static, const, inline, and friend member functions. Any of these types can be used in a program as per requirements. 2. Simple member functions are ______________________ a) Ones defined simply without any type b) Ones defined with keyword simple c) Ones that are implicitly provided d) Ones which are defined in all the classes Answer: a Explanation: When there is no type defined for any function and just a simple syntax is used with the return type, function name and parameter list then those are known as simple member functions. This is a general definition of simple members. 3. What are static member functions? a) Functions which use only static data member but can’t be accessed directly b) Functions which uses static and other data members c) Functions which can be accessed outside the class with the data members d) Functions using only static data and can be accessed directly in main() function Answer: d Explanation: The static member functions can be accessed directly in the main function. There is no restriction on direct use. We can call them with use of objects also. But the restriction is that the static member functions can only use the static data members of the class. 4. How can static member function can be accessed directly in main() function? a) Dot operator b) Colon c) Scope resolution operator d) Arrow operator Answer: c Explanation: The static member functions can be accessed directly in the main() function. The only restriction is that those must use only static data members of the class. These functions are property of class rather than each object. 5. Correct syntax to access the static member functions from the main() function is ______________ a) classObject::functionName(); b) className::functionName(); c) className:classObject:functionName(); d) className.classObject:functionName(); Answer: b Explanation: The syntax in option b must be followed in order to call the static functions directly from the main() function. That is a predefined syntax. Scope resolution helps to spot the correct function in the correct class. 6. What are const member functions? a) Functions in which none of the data members can be changed in a program b) Functions in which only static members can be changed c) Functions which treat all the data members as constant and doesn’t allow changes d) Functions which can change only the static members Answer: c Explanation: The const member functions are intended to keep the value of all the data members of a class same and doesn’t allow any changes on them. The data members are treated as constant data and any modification inside the const function is restricted. 7. Which among the following best describes the inline member functions? a) Functions defined inside the class only b) Functions with keyword inline only c) Functions defined outside the class d) Functions defined inside the class or with the keyword inline Answer: d

Explanation: The functions which are defined with the keyword inline or are defined inside the class are treated to be inline functions. Definitions inside the class are implicitly made inline if none of the complex statements are used in the definition. 8. What are friend member functions (C++)? a) Member function which can access all the members of a class b) Member function which can modify any data of a class c) Member function which doesn’t have access to private members d) Non-member functions which have access to all the members (including private) of a class Answer: d Explanation: A non-member function of a class which can access even the private data of a class is a friend function. It is an exception on access to private members outside the class. It is sometimes considered as a member functions since it has all the access that a member function in general have. 9. What is the syntax of a const member function? a) void fun() const {} b) void fun() constant {} c) void const fun() {} d) const void fun(){} Answer: a Explanation: The general syntax to be followed in order to declare a const function in a class is as in option a. The syntax may vary in different programming languages. 10. Which keyword is used to make a nonmember function as friend function of a class? a) friendly b) new c) friend d) connect Answer: c Explanation: The keyword friend is provided in programming languages to use it whenever a functions is to be made friend of one class or other. The keyword indicates that the function is capable of new functionalities like accessing private members. 11. Member functions _____________________ a) Must be defined inside class body b) Can be defined inside class body or outside c) Must be defined outside the class body d) Can be defined in another class Answer: c Explanation: The functions definitions can be given inside or outside the body of class. If defined inside, general syntax is used. If defined outside then the class name followed by scope resolution operator and then function name must be given for the definition. 12. All type of member functions can’t be used inside a single class. a) True b) False Answer: b Explanation: There is no restriction on the use of type of member functions inside a single class. Any type any number of times can be defined inside a class. The member functions can be used as required. 13. Which among the following is true? a) Member functions can never be private b) Member functions can never be protected c) Member functions can never be public d) Member functions can be defined in any access specifier Answer: d Explanation: The member functions can be defined inside any specifier. There is no restriction. The programmer can apply restrictions on its use by specifying the access specifier with the functions. 14. Which keyword is used to define the static member functions? a) static b) stop c) open d) state Answer: a Explanation: The static keyword is used to declare any static member function in a class. The static members become common

to each object of the class being created. They share the same values. 15. Which keyword is used to define the inline member function? a) no keyword required b) inline c) inlined d) line Answer: b Explanation: The inline keyword is used to defined the inline member functions in a class. The functions are implicitly made inline if defined inside the class body, but only if they doesn’t have any complex statement inside. All functions defined outside the class body must be mentioned with an explicit inline keyword.

This set of Object Oriented Programming (OOPs) using C++ Multiple Choice Questions & Answers (MCQs) focuses on “Abstract Class”. 1. Which among the following best describes abstract classes? a) If a class has more than one virtual function, it’s abstract class b) If a class have only one pure virtual function, it’s abstract class c) If a class has at least one pure virtual function, it’s abstract class d) If a class has all the pure virtual functions only, then it’s abstract class Answer: c Explanation: The condition for a class to be called abstract class is that it must have at least one pure virtual function. The keyword abstract must be used while defining abstract class in java. 2. Can abstract class have main() function defined inside it? a) Yes, depending on return type of main() b) Yes, always c) No, main must not be defined inside abstract class d) No, because main() is not abstract function Answer: b Explanation: This is a property of abstract class. It can define main() function inside it. There is no restriction on its definition and implementation. 3. If there is an abstract method in a class then, ________________ a) Class must be abstract class b) Class may or may not be abstract class c) Class is generic d) Class must be public Answer: a Explanation: It is a rule that if a class have even one abstract method, it must be an abstract class. If this rule was not made, the abstract methods would have got skipped to get defined in some places which are undesirable with the idea of abstract class. 4. If a class is extending/inheriting another abstract class having abstract method, then _______________________ a) Either implementation of method or making class abstract is mandatory b) Implementation of the method in derived class is mandatory c) Making the derived class also abstract is mandatory d) It’s not mandatory to implement the abstract method of parent class Answer: a Explanation: Either of the two things must be done, either implementation or declaration of class as abstract. This is done to ensure that the method intended to be defined by other classes gets defined at every possible class. 5. Abstract class A has 4 virtual functions. Abstract class B defines only 2 of those member functions as it extends class A. Class C extends class B and implements the other two member functions of class A. Choose the correct option below. a) Program won’t run as all the methods are not defined by B b) Program won’t run as C is not inheriting A directly c) Program won’t run as multiple inheritance is used d) Program runs correctly Answer: d Explanation: The program runs correctly. This is because even class B is abstract so it’s not mandatory to define all the virtual functions. Class C is not abstract but all the virtual functions have been implemented will that class. 6. Abstract classes can ____________________ instances. a) Never have b) Always have c) Have array of d) Have pointer of Answer: a Explanation: When an abstract class is defined, it won’t be having the implementation of at least one function. This will restrict the class to have any constructor. When the class doesn’t have constructor, there won’t be any instance of that class. 7. We ___________________ to an abstract class. a) Can create pointers b) Can create references c) Can create pointers or references d) Can’t create any reference, pointer or instance Answer: c

Explanation: Even though there can’t be any instance of abstract class. We can always create pointer or reference to abstract class. The member functions which have some implementation inside abstract itself can be used with these references. 8. Which among the following is an important use of abstract classes? a) Header files b) Class Libraries c) Class definitions d) Class inheritance Answer: b Explanation: The abstract classes can be used to create a generic, extensible class library that can be used by other programmers. This helps us to get some already implemented codes and functions that might have not been provided by the programming language itself. 9. Use of pointers or reference to an abstract class gives rise to which among the following feature? a) Static Polymorphism b) Runtime polymorphism c) Compile time Polymorphism d) Polymorphism within methods Answer: b Explanation: The runtime polymorphism is supported by reference and pointer to an abstract class. This relies upon base class pointer and reference to select the proper virtual function. 10. The abstract classes in java can _________________ a) Implement constructors b) Can’t implement constructor c) Can implement only unimplemented methods d) Can’t implement any type of constructor Answer: a Explanation: The abstract classes in java can define a constructor. Even though instance can’t be created. But in this way, only during constructor chaining, constructor can be called. When instance of concrete implementation class is created, it’s known as constructor chaining. 11. Abstract class can’t be final in java. a) True b) False Answer: a Explanation: If an abstract class is made final in java, it will stop the abstract class from being extended. And if the class is not getting extended, there won’t be another class to implement the virtual functions. Due to this contradicting fact, it can’t be final in java. 12. Can abstract classes have static methods (Java)? a) Yes, always b) Yes, but depends on code c) No, never d) No, static members can’t have different values Answer: a Explanation: There is no restriction on declaring static methods. The only condition is that the virtual functions must have some definition in the program. 13. It is _________________________ to have an abstract method. a) Not mandatory for an static class b) Not mandatory for a derived class c) Not mandatory for an abstract class d) Not mandatory for parent class Answer: c Explanation: Derived, parent and static classes can’t have abstract method (We can’t say what type of these classes is). And for abstract class it’s not mandatory to have abstract method. But if any abstract method is there inside a class, then class must be abstract type. 14. How many abstract classes can a single program contain? a) At most 1 b) At least 1 c) At most 127 d) As many as required

Answer: d Explanation: There is no restriction on the number of abstract classes that can be defined inside a single program. The programs can use as many abstract classes as required. But the functions with no body must be implemented. 15. Is it necessary that all the abstract methods must be defined from an abstract class? a) Yes, depending on code b) Yes, always c) No, never d) No, if function is not used, no definition is required Answer: b Explanation: That is the rule of programming language that each function declared, must have some definition. There can’t be some abstract method that remains undefined. Even if it’s there, it would result in compile time error.

This set of Object Oriented Programming (OOPs) using C++ Multiple Choice Questions & Answers (MCQs) focuses on ” Abstract Function”. 1. Which among the following best defines the abstract methods? a) Functions declared and defined in base class b) Functions only declared in base class c) Function which may or may not be defined in base class d) Function which must be declared in derived class Answer: b Explanation: The abstract functions must only be declared in base class. Their definitions are provided by the derived classes. It is a mandatory condition. 2. Which among the following is true? a) The abstract functions must be only declared in derived classes b) The abstract functions must not be defined in derived classes c) The abstract functions must be defined in base and derived class d) The abstract functions must be defined either in base or derived class Answer: a Explanation: The abstract functions can’t be defined in base class. They are to be defined in derived classes. It is a rule for abstract functions. 3. How are abstract functions different from the abstract functions? a) Abstract must not be defined in base class whereas virtual function can be defined b) Either of those must be defined in base class c) Different according to definition d) Abstract functions are faster Answer: a Explanation: The abstract functions are only declared in base class. Derived classes have to implement those functions in order to inherit that base class. The functions are always defined in derived classes only. 4. Which among the following is correct? a) Abstract functions should not be defined in all the derived classes b) Abstract functions should be defined only in one derived class c) Abstract functions must be defined in base class d) Abstract functions must be defined in all the derived classes Answer: d Explanation: The abstract function are only declared in base classes and then has to be defined in all the derived classes. This allows all the derived classes to define own definition of any function whose declaration in base class might be common to all the other derived classes. 5. It is ____________________ to define the abstract functions. a) Mandatory for all the classes in program b) Necessary for all the base classes c) Necessary for all the derived classes d) Not mandatory for all the derived classes Answer: c Explanation: The derived classes must define the abstract function of base class in their own body. This is a necessary condition. Because the abstract functions doesn’t contain any definition in base class and hence becomes mandatory for the derived class to define them. All the functions in a program must have some definition. 6. The abstract function definitions in derived classes is enforced at _________ a) Runtime b) Compile time c) Writing code time d) Interpreting time Answer: b Explanation: When the program is compiled, these definitions are checked if properly defined. This compiler also ensure that the function is being defined by all the derived classes. Hence we get a compile time error if not done. 7. What is this feature of enforcing definitions of abstract function at compile time called? a) Static polymorphism b) Polymorphism c) Dynamic polymorphism d) Static or dynamic according to need

Answer: c Explanation: The feature is known as Dynamic polymorphism. Because the definitions are resolved at runtime. Even though the definitions are checked at compile time, they are resolved at runtime only. 8. What is the syntax for using abstract method? a) <access-modifier>abstract<return-type>method_name (parameter) b) abs<return-type>method name (parameter) c) <access-modifier>abstract return-type method name (parameter) d) <access-modifier>abstract <returning> method name (parameter) Answer: a Explanation: The syntax must firstly contain the access modifier. Then the keyword abstract is written to mention clearly to the compiler that it is an abstract method. Then prototype of the function with return type, function name and parameters. 9. If a function declared as abstract in base class doesn’t have to be defined in derived class then ______ a) Derived class must define the function anyhow b) Derived class should be made abstract class c) Derived class should not derive from that base class d) Derived class should not use that function Answer: b Explanation: If the function that is not to be defined in derived class but is declared as abstract in base class then the derived class must be made an abstract class. This will make the concept mandatory that the derived class must have one subclass to define that method. 10. Static methods can’t be made abstract in java. a) True b) False Answer: a Explanation: The abstract functions can’t be made static in a program. If those are made static then the function will be a property of class rather than each object. In turn ever object or derived class must use the common definition given in the base class. But abstract functions can’t be defined in the base class. Hence not possible. 11. Which among the following is true? a) Abstract methods can be static b) Abstract methods can be defined in derived class c) Abstract methods must not be static d) Abstract methods can be made static in derived class Answer: c Explanation: The abstract methods can never be made static. Even if it is in derived class, it can’t be made static. If this happens, then all the subsequent sub classes will have a common definition of abstract function which is not desirable. 12. Which among the following is correct for abstract methods? a) It must have different prototype in the derived class b) It must have same prototype in both base and derived class c) It must have different signature in derived class d) It must have same return type only Answer: b Explanation: The prototype must be the same. This is to override the function declared as abstract in base class. Or else it will not be possible to override the abstract function of base class and hence we get a compile time error. 13. If a class have all the abstract methods the class will be known as ___________ a) Abstract class b) Anonymous class c) Base class d) Derived class Answer: a Explanation: The classes containing all the abstract methods are known as abstract classes. And the abstract classes can never have any normal function with definition. Hence known as abstract class. 14. The abstract methods can never be ___________ in a base class. a) Private b) Protected c) Public d) Default Answer: a

Explanation: The base class must not contain the abstract methods. The methods have to be derived and defined in derived class. But if it is made private it can’t be inherited. Hence we can’t declare it as a private member. 15. The abstract method definition can be made ___________ in derived class. a) Private b) Protected c) Public d) Private, public, or protected Answer: d Explanation: The derived class implements the definition of the abstract methods of base class. Those can be made private in derived class if security is needed. There won’t be any problem in declaring it as private.

This set of Object Oriented Programming (OOPs) using C++ Multiple Choice Questions & Answers (MCQs) focuses on “Abstraction”. 1. Which among the following best defines abstraction? a) Hiding the implementation b) Showing the important data c) Hiding the important data d) Hiding the implementation and showing only the features Answer: d Explanation: It includes hiding the implementation part and showing only the required data and features to the user. It is done to hide the implementation complexity and details from the user. And to provide a good interface in programming. 2. Hiding the implementation complexity can ____________ a) Make the programming easy b) Make the programming complex c) Provide more number of features d) Provide better features Answer: a Explanation: It can make programming easy. The programming need not know how the inbuilt functions are working but can use those complex functions directly in the program. It doesn’t provide more number of features or better features. 3. Class is _________ abstraction. a) Object b) Logical c) Real d) Hypothetical Answer: b Explanation: Class is logical abstraction because it provides a logical structure for all of its objects. It gives an overview of the features of an object. 4. Object is ________ abstraction. a) Object b) Logical c) Real d) Hypothetical Answer: c Explanation: Object is real abstraction because it actually contains those features of class. It is the implementation of overview given by class. Hence the class is logical abstraction and its object is real. 5. Abstraction gives higher degree of ________ a) Class usage b) Program complexity c) Idealized interface d) Unstable interface Answer: c Explanation: It is to idealize the interface. In this way the programmer can use the programming features more efficiently and can code better. It can’t increase the program complexity, as the feature itself is made to hide it. 6. Abstraction can apply to ____________ a) Control and data b) Only data c) Only control d) Classes Answer: a Explanation: Abstraction applies to both. Control abstraction involves use of subroutines and control flow abstraction. Data abstraction involves handling pieces of data in meaningful ways. 7. Which among the following can be viewed as combination of abstraction of data and code. a) Class b) Object c) Inheritance d) Interfaces Answer: b Explanation: Object can be viewed as abstraction of data and code. It uses data members and their functioning as data

1. Which of the following is not an example of Social Media?

Twitter
Google
Instagram
YouTube

Answer

Google

2. By 2025, the volume of digital data will increase to

Answer

3. Data Analysis is a process of

inspecting data
cleaning data
transforming data
All of Above

Answer

All of above

4. Does Facebook uses “Big Data ” to perform the concept of Flashback?

True
False

Answer

True

5. Which of the following is not a major data analysis approaches?

Data Mining
Predictive Intelligence
Business Intelligence
Text Analytics

Answer

Predictive Intelligence

6. The Process of describing the data that is huge and complex to store and process is known as

Analytics
Data mining
Big data
Data warehouse

Answer

Big data

7. How many main statistical methodologies are used in data analysis?

Answer

8. In descriptive statistics, data from the entire population or a sample is summarized with ?

Integer descriptor
floating descriptor
numerical descriptor
decimal descriptor

Answer

numerical descriptor

9. ____ have a structure but cannot be stored in a database.

Structured
Semi Structured
Unstructured
None of these

Answer

None of these

10. Data generated from online transactions is one of the example for volume of big data

TRUE
FALSE

Answer

TRUE

11. Velocity is the speed at which the data is processed

True
False

Answer

False

12. Value tells the trustworthiness of data in terms of quality and accuracy

TRUE
FALSE

Answer

False

13. Hortonworks was introduced by Cloudera and owned by Yahoo

True
False

Answer

False

14. ____ refers to the ability to turn your data useful for business

Velocity
variety
Value
Volume

Answer

Value

15. GFS consists of ____ Master and _____ Chunk Servers

Single, Single
Multiple, Single
Single, Multiple
Multiple, Multiple

Answer

Single, Multiple

16. Data Analysis is defined by the statistician?

William S.
Hans Peter Luhn
Gregory Piatetsky-Shapiro
John Tukey

Answer

John Tukey

17. Files are divided into ____ sized Chunks.

Static
Dynamic
Fixed
Variable

Answer

Fixed

18. _____ is an open source framework for storing data and running application on clusters of commodity hardware.

HDFS
Hadoop
MapReduce
Cloud

Answer

Hadoop

19. HDFS Stores how much data in each clusters that can be scaled at any time?

Answer

128

20. Hadoop Map Reduce allows you to perform distributed parallel processing on large volumes of data quickly and efficiently… is this Map Reduce or Hadoop

True
False

Answer

True

21. Google Introduced Map Reduce Programming model in 2004

True
False

Answer

True

22. Hadoop YARN is used for Cluster Resource Management in Hadoop Ecosystem

True
False

Answer

True

23. _____ phase sorts the data & _____ creates logical clusters.

Reduce, YARN
MAP, YARN
REDUCE, MAP
MAP, REDUCE

Answer

MAP, REDUCE

24. There is only one operation between Mapping and Reducing

True
False

Answer

True

25. Which of the following is true about hypothesis testing?

answering yes/no questions about the data
estimating numerical characteristics of the data
describing associations within the data
modeling relationships within the data

Answer

answering yes/no questions about the data

26. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities

True
False

Answer

True

27. ____ is factors considered before Adopting Big Data Technology

Validation
Verification
Data
Design

Answer

Validation

28. for improving supply chain management to optimize stock management, replenishment, and forecasting

Descriptive
Diagnostic
Predictive
Prescriptive

Answer

Predictive

29. which among the following is not a Data mining and analytical applications?

profile matching
social network analysis
facial recognition
Filtering

Answer

Filtering

30. _____ as a result of data accessibility, data latency, data availability, or limits on bandwidth in relation to the size of inputs

Computation-restricted throttling
Large data volumes
Data throttling
Data Parallelization

Answer

Data throttling

31. As an example, an expectation of using a recommendation engine would be to increase same-customer sales by adding more items into the market basket

Lowering costs
Increasing revenues
Increasing productivity
Reducing risk

Answer

Increasing revenues

32. Which storage subsystem can support massive data volumes of increasing size.

Extensibility
Fault tolerance
Scalability
High-speed I/O capacity

Answer

Scalability

33. _____ provides performance through distribution of data and fault tolerance through replication

HDFS
PIG
HIVE
HADOOP

Answer

HDFS

34. ______ is a programming model for writing applications that can process Big Data in parallel on multiple nodes.

HDFS
MAP REDUCE
HADOOP
HIVE

Answer

MAP REDUCE

35. ____ takes the grouped key-value paired data as input and runs a Reducer function on each one of them.

MAPPER
REDUCER
COMBINER
PARTITIONER

Answer

REDUCER

36. ____ is a type of local Reducer that groups similar data from the map phase into identifiable sets.

MAPPER
REDUCER
COMBINER
PARTITIONER

Answer

COMBINER

37. While Installing Hadoop how many xml files are edited and list them ?

core-site.xml
hdfs-site.xml
mapred.xml
yarn.xml

Answer

core-site.xml

38. Movie Recommendation systems are an example of

Classification
Clustering
Reinforcement Learning
Regression

2 only
1 and 3
1 and 2
2 and 3

Answer

1 and 3

39. Sentiment Analysis is an example of

Regression
Classification
clustering
Reinforcement Learning

1, 2 and 4
1, 2 and 3
1 and 3
1 and 2

Answer

1, 2 and 4

1. The branch of statistics which deals with development of particular statistical methods is classified as

industry statistics
economic statistics
applied statistics
applied statistics

Show Answer

applied statistics

2. Which of the following is true about regression analysis?

answering yes/no questions about the data
estimating numerical characteristics of the data
modeling relationships within the data
describing associations within the data

Show Answer

modeling relationships within the data

3. Text Analytics, also referred to as Text Mining?

True
False
Can be true or False
Can not say

Show Answer

TRUE

4. What is a hypothesis?

A statement that the researcher wants to test through the data collected in a study.
A research question the results will answer.
A theory that underpins the study.
A statistical method for calculating the extent to which the results could have happened by chance.

Show Answer

A statement that the researcher wants to test through the data collected in a study.

5. What is the cyclical process of collecting and analyzing data during a single research study called?

Interim Analysis
Inter analysis
inter item analysis
constant analysis

Show Answer

Interim Analysis

6. The process of quantifying data is referred to as ____

Topology
Digramming
Enumeration
coding

Show Answer

Enumeration

7. An advantage of using computer programs for qualitative data is that they _

Can reduce time required to analyse data (i.e., after the data are transcribed)
Help in storing and organising data
Make many procedures available that are rarely done by hand due to time constraints
All of the above

Show Answer

All of the Above

8. Boolean operators are words that are used to create logical combinations.

True
False

Show Answer

True

9. ______ are the basic building blocks of qualitative data.

Categories
Units
Individuals
None of the above

Show Answer

Categories

10. This is the process of transforming qualitative research data from written interviews or field notes into typed text.

Segmenting
Coding
Transcription
Mnemoning

Answer

Transcription

11. A challenge of qualitative data analysis is that it often includes data that are unwieldy and complex; it is a major challenge to make sense of the large pool of data.

True
False

Answer

True

12. Hypothesis testing and estimation are both types of descriptive statistics.

True
False

Answer

False

13. A set of data organised in a participants(rows)-by-variables(columns) format is known as a “data set.”

True
False

Answer

True

14. A graph that uses vertical bars to represent data is called a ___

Line graph
Bar graph
Scatterplot
Vertical graph

Answer

Bar graph

15. ____ are used when you want to visually examine the relationship between two quantitative variables.

Bar graph
pie graph
line graph
Scatterplot

Answer

Scatterplot

16. The denominator (bottom) of the z-score formula is

The standard deviation
The difference between a score and the mean
The range
The mean

Answer

The standard deviation

17. Which of these distributions is used for a testing hypothesis?

Normal Distribution
Chi-Squared Distribution
Gamma Distribution
Poisson Distribution

Answer

Chi-Squared Distribution

18. A statement made about a population for testing purpose is called?

Statistic
Hypothesis
Level of Significance
Test-Statistic

Answer

Hypothesis

19. If the assumed hypothesis is tested for rejection considering it to be true is called?

Null Hypothesis
Statistical Hypothesis
Simple Hypothesis
Composite Hypothesis

Answer

Null Hypothesis

20. If the null hypothesis is false then which of the following is accepted?

Null Hypothesis
Positive Hypothesis
Negative Hypothesis
Alternative Hypothesis.

Answer

Alternative Hypothesis.

21. Alternative Hypothesis is also called as?

Composite hypothesis
Research Hypothesis
Simple Hypothesis
Null Hypothesis

Answer

Research Hypothesis

1. Which is the type of Microcomputer Memory

Address
Contents
Both a and b
none

Show Answer

Both a and b

2. A collection of lines that connects several devices is called

Bus
Peripheral connection wires
Both a and b
internal wires

Show Answer

Bus

3. Conventional architectures coarsely comprise of a

Processor
Memory System
Data path
All of the above

Show Answer

All of the above

4. VLIW processors rely on

Compile time analysis
Initial time analysis
Final time analysis
id time analysis

Show Answer

Compile time analysis

5. HPC is not used in high span bridges

True
False

Show Answer

False

6. The access time of memory is …………… the time required for performing any single CPU operation.

longer than
shorter than
negligible than
same as

Show Answer

longer than

7. Data intensive applications utilize_

High aggregate throughput
High aggregate network bandwidth
high processing and memory system performance
none of above

Show Answer

High aggregate throughput

8. Memory system performance is largely captured by_

Latency
bandwidth
both a and b
none of above

Show Answer

both a and b

9. A processor performing fetch or decoding of different instruction during the execution of another instruction is called __ .

Super-scaling
Pipe-lining
Parallel Computation
none of above

Show Answer

Pipe-lining

10. For a given FINITE number of instructions to be executed, which architecture of the processor provides for a faster execution ?

ISA
ANSA
Super-scalar
All of the above

Show Answer

Super-scalar

11. HPC works out to be economical.

True
false

Show Answer

True

12. High Performance Computing of the Computer System tasks are done by

Node Cluster
Network Cluster
Beowulf Cluster
Stratified Cluster

Show Answer

Beowulf Cluster

13. Octa Core Processors are the processors of the computer system that contains

2 Processors
4 Processors
6 Processors
8 Processors

Show Answer

8 Processors

14. Parallel computing uses _ execution

sequential
unique
simultaneous
None of above

Show Answer

simultaneous

15. Which of the following is NOT a characteristic of parallel computing?

Breaks a task into pieces
Uses a single processor or computer
Simultaneous execution
May use networking

Show Answer

Uses a single processor or computer

16. Which of the following is true about parallel computing performance?

Computations use multiple processors
There is an increase in speed
The increase in speed is loosely tied to the number of processor or computers used
All of the answers are correct.

Show Answer

All of the answers are correct.

17. __ leads to concurrency.

Serialization
Parallelism
Serial processing
Distribution

Show Answer

Parallelism

18. MIPS stands for?

Mandatory Instructions/sec
Millions of Instructions/sec
Most of Instructions/sec
Many Instructions / sec

Show Answer

Millions of Instructions/sec

19. Which MIMD systems are best scalable with respect to the number of processors

Distributed memory computers
consume systems
Symmetric multiprocessors
None of above

Show Answer

Distributed memory computers

20. To which class of systems does the von Neumann computer belong?

SIMD (Single Instruction Multiple Data)
MIMD (Multiple Instruction Multiple Data)
MISD (Multiple Instruction Single Data)
SISD (Single Instruction Single Data)

Show Answer

SISD (Single Instruction Single Data)

21. Which of the architecture is power efficient?

CISC
RISC
ISA
IANA

Show Answer

RISC

22. Pipe-lining is a unique feature of _.

RISC
CISC
ISA
IANA

Show Answer

RISC

23. The computer architecture aimed at reducing the time of execution of instructions is __.

RISC
CISC
ISA
IANA

Show Answer

RISC

24. Type of microcomputer memory is

processor memory
primary memory
secondary memory
All of above

Show Answer

All of above

25. A pipeline is like_

Overlaps various stages of instruction execution to achieve performance.
House pipeline
Both a and b
A gas line

Show Answer

Overlaps various stages of instruction execution to achieve performance.

26. Scheduling of instructions is determined_

True Data Dependency
Resource Dependency
Branch Dependency
All of above

Show Answer

All of above

27. The fraction of data references satisfied by the cache is called_

Cache hit ratio
Cache fit ratio
Cache best ratio
none of above

Show Answer

Cache hit ratio

28. A single control unit that dispatches the same Instruction to various processors is__

SIMD
SPMD
MIMD
none of above

Show Answer

SIMD

29. The primary forms of data exchange between parallel tasks are_

Accessing a shared data space
Exchanging messages.
Both A and B
none of above

Show Answer

Both A and B

30. Switches map a fixed number of inputs to outputs.

True
False

Show Answer

True

The First step in developing a parallel algorithm is_

To Decompose the problem into tasks that can be executed concurrently
Execute directly
Execute indirectly
None of Above

Answer

To Decompose the problem into tasks that can be executed concurrently

2. The number of tasks into which a problem is decomposed determines its_

Granularity
Priority
Modernity
None of Above

Answer

Granularity

3. The length of the longest path in a task dependency graph is called_

the critical path length
the critical data length
the critical bit length
None of Above

Answer

he critical path length

4. The graph of tasks (nodes) and their interactions/data exchange (edges)_

Is referred to as a task interaction graph
Is referred to as a task Communication graph
Is referred to as a task interface graph
None of Above

Answer

Is referred to as a task interaction graph

5. Mappings are determined by_

task dependency
task interaction graphs
Both A and B
None of Above

Answer

Both A and B

6. Decomposition Techniques are_

recursive decomposition
data decomposition
exploratory decomposition
speculative decomposition
All of above

Answer

All of above

7. The Owner Computes rule generally states that the process assigned a particular data item is responsible for _

All computation associated with it
Only one computation
Only two computation
Only occasionally computation

Answer

All computation associated with it

8. A simple application of exploratory decomposition is_

The solution to a 15 puzzle
The solution to 20 puzzle
The solution to any puzzle
None of Above

Answer

The solution to a 15 puzzle

9. Speculative Decomposition consist of _

conservative approaches
optimistic approaches
Both A and B
only B

Answer

Both A and B

hpc mcq questions

10. task characteristics include:

Task generation.
Task sizes.
Size of data associated with tasks.
All of above

Answer

All of above

11. What is a high performance multi-core processor that can be used to accelerate a wide variety of applications using parallel computing.

Answer

GPU

12. What is GPU?

Grouped Processing Unit
Graphics Processing Unit
Graphical Performance Utility
Graphical Portable Unit

Answer

Graphics Processing Unit

13. A code, known as GRID, which runs on GPU consisting of a set of

32 Thread
32 Block
Unit Block
Thread Block

Answer

Thread Block

14. Interprocessor communication that takes place

Centralized memory
Shared memory
Message passing
Both A and B

Answer

Both A and B

15. Decomposition into a large number of tasks results in coarse-grained decomposition

True
False

Answer

False

16. Relevant task characteristics include

Task generation.
Task sizes
Size of data associated with tasks
Overhead
both A and B

Answer

both A and B

17. The fetch and execution cycles are interleaved with the help of __

Modification in processor architecture
Clock
Special unit
Control unit

Answer

Clock

18. The processor of system which can read /write GPU memory is known as

kernal
device
Server
Host

Answer

Host

19. Increasing the granularity of decomposition and utilizing the resulting concurrency to perform more tasks in parallel decreses performance.

TRUE
FALSE

Answer

FALSE

Parallel computing mcq with answers

20. If there is dependency between tasks it implies their is no need of interaction between them.

TRUE
FALSE

Answer

FALSE

21. Parallel quick sort is example of task parallel model

TRUE
FALSE

Answer

TRUE

22. True Data Dependency is

The result of one operation is an input to the next.
Two operations require the same resource.

Answer

The result of one operation is an input to the next.

23. What is Granularity ?

The size of database
The size of data item
The size of record
The size of file

Answer

The size of data item

24. In coarse-grained parallelism, a program is split into …………………… task and ……………………… Size

Large tasks , Smaller Size
Small Tasks , Larger Size
Small Tasks , Smaller Size
Equal task, Equal Size

Answer

Large tasks , Smaller Size

1. The primary and essential mechanism to support the sparse matrices is

Gather-scatter operations
Gather operations
Scatter operations
Gather-scatter technique

Answer

Gather-scatter operations

2. In the gather operation, a single node collects a ———

Unique message from each node
Unique message from only one node
Different message from each node
None of Above

Answer

Unique message from each node

3. In the scatter operation, a single node sends a ————

Unique message of size m to every other node
Different message of size m to every other node
Different message of different size m to every other node
All of Above

Answer

Unique message of size m to every other node

4. Is All to all Bradcasting is same as All to all personalized communication?

Answer

5. Is scatter operation is same as Broadcast?

Answer

6. All-to-all personalized communication is also known as

Total Exchange
Personal Message
Scatter
Gather

Answer

Total Exchange

7. By which way, scatter operation is different than broadcast

Message size
Number of nodes
Same
None of above

Answer

Message size

8. The gather operation is exactly the _ of the scatter operation

Inverse
Reverse
Multiple
Same

Answer

Inverse

9. The gather operation is exactly the inverse of the_

Scatter operation
Broadcast operation
Prefix Sum
Reduction operation

Answer

Scatter operation

hpc mcq questions

10. The dual of one-to-all broadcast is all-to-one reduction. True or False?

TRUE
FALSE

Answer

TRUE

11. A binary tree in which processors are (logically) at the leaves and internal nodes are routing nodes.

TRUE
FALSE

Answer

TRUE

12. Group communication operations are built using point-to-point messaging primitives

TRUE
FALSE

Answer

TRUE

13. Communicating a message of size m over an uncongested network takes time ts + tmw

True
False

Answer

True

14. Parallel programs: Which speedup could be achieved according to Amdahl´s law for infinite number of processors if 5% of a program is sequential and the remaining part is ideally parallel?

Infinite speedup
5
20
None of above

Answer

15. Shift register that performs a circular shift is called

Invalid Counter
Valid Counter
Ring
Undefined

Answer

Ring

16. 8 bit information can be stored in

2 Registers
4 Registers
6 Registers
8 Registers

Answer

8 Registers

17. The result of prefix expression * / b + – d a c d, where a = 3, b = 6, c = 1, d = 5 is

Answer

18. The height of a binary tree is the maximum number of edges in any root to leaf path. The maximum number of nodes in a binary tree of height h is?

2h – 1
2h – 1 – 1
2h + 1 – 1
2 * (h+1)

Answer

2h + 1 – 1

19. A hypercube has_

2^d nodes
2d nodes
2n Nodes
N Nodes

Answer

2^d nodes

Parallel computing mcq with answers

20. The Prefix Sum Operation can be implemented using the_

All-to-all broadcast kernel
All-to-one broadcast kernel
One-to-all broadcast Kernel
Scatter Kernel

Answer

All-to-all broadcast kernel

21.In the scatter operation_

Single node send a unique message of size m to every other node
Single node send a same message of size m to every other node
Single node send a unique message of size m to next node
None of Above

Answer

Single node send a unique message of size m to every other node

22. In All-to-All Personalized Communication Each node has a distinct message of size m for every other node

True
False

Answer

True

23. A binary tree in which processors are (logically) at the leaves and internal nodes are
routing nodes.

True
False

Answer

True

24. In All-to-All Broadcast each processor is the source as well as destination.

True
False

Answer

True

1. The First step in developing a parallel algorithm is_

To Decompose the problem into tasks that can be executed concurrently
Execute directly
Execute indirectly
None of Above

Answer

To Decompose the problem into tasks that can be executed concurrently

2. The number of tasks into which a problem is decomposed determines its_

Granularity
Priority
Modernity
None of Above

Answer

Granularity

3. The length of the longest path in a task dependency graph is called_

the critical path length
the critical data length
the critical bit length
None of Above

Answer

he critical path length

4. The graph of tasks (nodes) and their interactions/data exchange (edges)_

Is referred to as a task interaction graph
Is referred to as a task Communication graph
Is referred to as a task interface graph
None of Above

Answer

Is referred to as a task interaction graph

5. Mappings are determined by_

task dependency
task interaction graphs
Both A and B
None of Above

Answer

Both A and B

6. Decomposition Techniques are_

recursive decomposition
data decomposition
exploratory decomposition
speculative decomposition
All of above

Answer

All of above

7. The Owner Computes rule generally states that the process assigned a particular data item is responsible for _

All computation associated with it
Only one computation
Only two computation
Only occasionally computation

Answer

All computation associated with it

8. A simple application of exploratory decomposition is_

The solution to a 15 puzzle
The solution to 20 puzzle
The solution to any puzzle
None of Above

Answer

The solution to a 15 puzzle

9. Speculative Decomposition consist of _

conservative approaches
optimistic approaches
Both A and B
only B

Answer

Both A and B

hpc mcq questions

10. task characteristics include:

Task generation.
Task sizes.
Size of data associated with tasks.
All of above

Answer

All of above

11. What is a high performance multi-core processor that can be used to accelerate a wide variety of applications using parallel computing.

Answer

GPU

12. What is GPU?

Grouped Processing Unit
Graphics Processing Unit
Graphical Performance Utility
Graphical Portable Unit

Answer

13. A code, known as GRID, which runs on GPU consisting of a set of

32 Thread
32 Block
Unit Block
Thread Block

Answer

Thread Block

14. Interprocessor communication that takes place

Centralized memory
Shared memory
Message passing
Both A and B

Answer

Both A and B

15. Decomposition into a large number of tasks results in coarse-grained decomposition

True
False

Answer

False

16. Relevant task characteristics include

Task generation.
Task sizes
Size of data associated with tasks
Overhead
both A and B

Answer

both A and B

17. The fetch and execution cycles are interleaved with the help of __

Modification in processor architecture
Clock
Special unit
Control unit

Answer

Clock

18. The processor of system which can read /write GPU memory is known as

kernal
device
Server
Host

Answer

Host

19. Increasing the granularity of decomposition and utilizing the resulting concurrency to perform more tasks in parallel decreses performance.

TRUE
FALSE

Answer

FALSE

Parallel computing mcq with answers

20. If there is dependency between tasks it implies their is no need of interaction between them.

TRUE
FALSE

Answer

FALSE

21. Parallel quick sort is example of task parallel model

TRUE
FALSE

Answer

TRUE

22. True Data Dependency is

The result of one operation is an input to the next.
Two operations require the same resource.

Answer

The result of one operation is an input to the next.

23. What is Granularity ?

The size of database
The size of data item
The size of record
The size of file

Answer

The size of data item

24. In coarse-grained parallelism, a program is split into …………………… task and ……………………… Size

Large tasks , Smaller Size
Small Tasks , Larger Size
Small Tasks , Smaller Size
Equal task, Equal Size

Answer

Large tasks , Smaller Size

1. The primary and essential mechanism to support the sparse matrices is

Gather-scatter operations
Gather operations
Scatter operations
Gather-scatter technique

Answer

Gather-scatter operations

2. In the gather operation, a single node collects a ———

Unique message from each node
Unique message from only one node
Different message from each node
None of Above

Answer

Unique message from each node

3. In the scatter operation, a single node sends a ————

Unique message of size m to every other node
Different message of size m to every other node
Different message of different size m to every other node
All of Above

Answer

Unique message of size m to every other node

4. Is All to all Bradcasting is same as All to all personalized communication?

Answer

5. Is scatter operation is same as Broadcast?

Answer

6. All-to-all personalized communication is also known as

Total Exchange
Personal Message
Scatter
Gather

Answer

Total Exchange

7. By which way, scatter operation is different than broadcast

Message size
Number of nodes
Same
None of above

Answer

Message size

8. The gather operation is exactly the _ of the scatter operation

Inverse
Reverse
Multiple
Same

Answer

Inverse

9. The gather operation is exactly the inverse of the_

Scatter operation
Broadcast operation
Prefix Sum
Reduction operation

Answer

Scatter operation

hpc mcq questions

10. The dual of one-to-all broadcast is all-to-one reduction. True or False?

TRUE
FALSE

Answer

TRUE

11. A binary tree in which processors are (logically) at the leaves and internal nodes are routing nodes.

TRUE
FALSE

Answer

TRUE

12. Group communication operations are built using point-to-point messaging primitives

TRUE
FALSE

Answer

TRUE

13. Communicating a message of size m over an uncongested network takes time ts + tmw

True
False

Answer

True

14. Parallel programs: Which speedup could be achieved according to Amdahl´s law for infinite number of processors if 5% of a program is sequential and the remaining part is ideally parallel?

Infinite speedup
5
20
None of above

Answer

15. Shift register that performs a circular shift is called

Invalid Counter
Valid Counter
Ring
Undefined

Answer

Ring

16. 8 bit information can be stored in

2 Registers
4 Registers
6 Registers
8 Registers

Answer

8 Registers

17. The result of prefix expression * / b + – d a c d, where a = 3, b = 6, c = 1, d = 5 is

Answer

18. The height of a binary tree is the maximum number of edges in any root to leaf path. The maximum number of nodes in a binary tree of height h is?

2h – 1
2h – 1 – 1
2h + 1 – 1
2 * (h+1)

Answer

2h + 1 – 1

19. A hypercube has_

2^d nodes
2d nodes
2n Nodes
N Nodes

Answer

2^d nodes

Parallel computing mcq with answers

20. The Prefix Sum Operation can be implemented using the_

All-to-all broadcast kernel
All-to-one broadcast kernel
One-to-all broadcast Kernel
Scatter Kernel

Answer

All-to-all broadcast kernel

21.In the scatter operation_

Single node send a unique message of size m to every other node
Single node send a same message of size m to every other node
Single node send a unique message of size m to next node
None of Above

Answer

Single node send a unique message of size m to every other node

22. In All-to-All Personalized Communication Each node has a distinct message of size m for every other node

True
False

Answer

True

23. A binary tree in which processors are (logically) at the leaves and internal nodes are
routing nodes.

True
False

Answer

True

24. In All-to-All Broadcast each processor is thesource as well as destination.

True
False

Answer

True

Unit 1

1. Data Analysis is defined by the statistician?

a. William S. b. Hans Peter Luhn c. Gregory Piatetsky-Shapiro d. John Tukey

Ans D

2. What is classification?

a) deciding what features to use in a pattern recognition problem b) deciding what class an input pattern belongs to c) deciding what type of neural network to use d) none of the mentioned

Ans. B

3. Data in ___________ bytes size is called Big Data.

A. Tera B. Giga C. Peta D. Meta

Ans : C

Explanation: data in Peta bytes i.e. 10^15 byte size is called Big Data. 4. How many V's of Big Data

A. 2 B. 3 C. 4 D. 5

Ans : D

Explanation: Big Data was defined by the “3Vs” but now there are “5Vs” of Big Data which are Volume, Velocity, Variety, Veracity, Value

5. Transaction data of the bank is?

A. structured data B. unstructured datat

C. Both A and B D. None of the above

Ans : A

Explanation: Data which can be saved in tables are structured data like the transaction data of the bank. 6. In how many forms BigData could be found?

A. 2 B. 3 C. 4 D. 5

Ans : B

Explanation: BigData could be found in three forms: Structured, Unstructured and Semi-structured. 7. Which of the following are Benefits of Big Data Processing?

A. Businesses can utilize outside intelligence while taking decisions B. Improved customer service C. Better operational efficiency D. All of the above

Ans : D

Explanation: All of the above are Benefits of Big Data Processing.

8. Which of the following are incorrect Big Data Technologies?

A. Apache Hadoop B. Apache Spark C. Apache Kafka D. Apache Pytarch

Ans : D

Explanation: Apache Pytarch is incorrect Big Data Technologies. 9. The overall percentage of the world’s total data has been created just within the past two years is ?

A. 80% B. 85%

C. 90% D. 95%

Ans : C

Explanation: The overall percentage of the world’s total data has been created just within the past two years is 90%. 10. Apache Kafka is an open-source platform that was created by?

A. LinkedIn B. Facebook C. Google D. IBM

Ans : A

Explanation: Apache Kafka is an open-source platform that was created by LinkedIn in the year 2011.

11. What was Hadoop named after?

A. Creator Doug Cutting’s favorite circus act B. Cuttings high school rock band C. The toy elephant of Cutting’s son D. A sound Cutting’s laptop made during Hadoop development

Ans : C

Explanation: Doug Cutting, Hadoop creator, named the framework after his child’s stuffed toy elephant. 12. What are the main components of Big Data?

A. MapReduce B. HDFS C. YARN D. All of the above

Ans : D

Explanation: All of the above are the main components of Big Data.

13. Point out the correct statement.

Ans : B

Explanation: Hadoop batch processes data distributed over a number of computers ranging in 100s and 1000s. 14. Which of the following fields come under the umbrella of Big Data?

A. Black Box Data B. Power Grid Data C. Search Engine Data D. All of the above

Ans : D

Explanation: All options are the fields come under the umbrella of Big Data.

15. Which of the following is not an example of Social Media? 1. Twitter 2. Google 3. Instagram 4. Youtube

ANs: 2 (Google)

16. By 2025, the volume of digital data will increase to 1. TB 2. YB 3. ZB 4. EB Ans: 3 ZB

17. Data Analysis is a process of 1. inspecting data 2. cleaning data 3. transforming data 4. All of Above

Ans. 4 All of above

18. Which of the following is not a major data analysis approaches? 1. Data Mining 2. Predictive Intelligence 3. Business Intelligence

4. Text Analytics

Ans. 2 Predictive Intelligence

19. The Process of describing the data that is huge and complex to store and process is known as 1. Analytics 2. Data mining 3. Big data 4. Data warehouse

Ans. 3 Big data

20. In descriptive statistics, data from the entire population or a sample is summarized with ? 1. Integer descriptor 2. floating descriptor 3. numerical descriptor 4. decimal descriptor

Ans. 3 numerical descriptor

21. Data generated from online transactions is one of the example for volume of big data 1. TRUE 2. FALSE

TRUE

22. Velocity is the speed at which the data is processed 1. True 2. False

False

23. Value tells the trustworthiness of data in terms of quality and accuracy 1. TRUE 2. FALSE

False

24. Hortonworks was introduced by Cloudera and owned by Yahoo 1. True 2. False

False

25. ____ refers to the ability to turn your data useful for business 1. Velocity 2. variety 3. Value 4. Volume

Ans. 3 Value

26. Data Analysis is defined by the statistician? 1. William S. 2. Hans Peter Luhn 3. Gregory Piatetsky-Shapiro 4. John Tukey

Ans. 4 John Tukey

27. Files are divided into ____ sized Chunks. 1. Static 2. Dynamic 3. Fixed 4. Variable

Ans. 3 Fixed

28. _____ is an open source framework for storing data and running application on clusters of commodity hardware. 1. HDFS 2. Hadoop 3. MapReduce 4. Cloud

Ans. 2 Hadoop

29. ____ is factors considered before Adopting Big Data Technology 1. Validation 2. Verification 3. Data 4. Design

Ans. 1 Validation

30. Which among the following is not a Data mining and analytical applications? 1. profile matching

2. social network analysis 3. facial recognition 4. Filtering

Ans. 4 Filtering

31. Which storage subsystem can support massive data volumes of increasing size. 1. Extensibility 2. Fault tolerance 3. Scalability 4. High-speed I/O capacity

Ans. 3 Scalability

32. ______ is a programming model for writing applications that can process Big Data in parallel on multiple nodes. 1. HDFS 2. MAP REDUCE 3. HADOOP 4. HIVE Ans. MAP REDUCE

33. How many main statistical methodologies are used in data analysis?

A. 2 B. 3 C. 4 D. 5

Ans : A

Explanation: In data analysis, two main statistical methodologies are used Descriptive statistics and Inferential statistics.

34. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

A. TRUE B. FALSE C. Can be true or false D. Can not say

Ans : A

Explanation: The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

35. The branch of statistics which deals with development of particular statistical methods is classified as 1. industry statistics 2. economic statistics 3. applied statistics 4. applied statistics

Ans. applied statistics

36. Point out the correct statement. a) Descriptive analysis is first kind of data analysis performed b) Descriptions can be generalized without statistical modelling c) Description and Interpretation are same in descriptive analysis d) None of the mentioned

Answer: b Explanation: Descriptive analysis describe a set of data.

37. What are the five V’s of Big Data?

A. Volume

B. Velocity

C. Variety

D. All the above

Answer: Option D

38. What are the main components of Big Data?

A. MapReduce

B. HDFS

C. YARN

D. All of these

Answer: Option D

39. What are the different features of Big Data Analytics?

A. Open-Source

B. Scalability

C. Data Recovery

D. All the above

Answer: Option D

40. Which of the following refers to the problem of finding abstracted patterns (or structures) in the unlabeled data?

A. Supervised learning

B. Unsupervised learning

C. Hybrid learning

D. Reinforcement learning

Answer: B

Explanation: Unsupervised learning is a type of machine learning algorithm that is generally used to find the hidden structured and patterns in the given unlabeled data.

41. Which one of the following refers to querying the unstructured textual data?

A. Information access

B. Information update

C. Information retrieval

D. Information manipulation

Answer: D

42. For what purpose, the analysis tools pre-compute the summaries of the huge amount of data?

A. In order to maintain consistency

B. For authentication

C. For data access

D. To obtain the queries response

Answer: d

43. Which one of the following statements is not correct about the data cleaning?

It refers to the process of data cleaning

It refers to the transformation of wrong data into correct data

It refers to correcting inconsistent data

All of the above

Answer: d

44. Any data with unknown form or the structure is classified as _ data. a. Structured b. Unstructured c. Semi-structured d. None of above Ans. b

45.____ means relating to the issuing of reports. a. Analysis b. Reporting c. Reporting and Analysis d. None of the above

Ans. b

46.Veracity involves the reliability of the data this is ________due to the numerous data sources of big data a) Easy and difficulty b) Easiness c) Demanding d) none of these

Ans. d

48. _____data is data whose elements are addressable for effective analysis.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. a

49. ______data is information that does not reside in a relational database but that have some organizational properties that make it easier to analyze.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. b

50. ______data is a data which is not organized in a predefined manner or does not have a predefined data model, thus it is not a good fit for a mainstream relational database.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. c

51. There are ___ types of big data.

a. 2

b. 3 c. 4 d. 5

Ans. b

52. Google search is an example of _________ data.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. c

UNIT 2

1. Sentiment Analysis is an example of 1. Regression 2. Classification 3. clustering 4. Reinforcement Learning

1. 1, 2 and 4 2. 1, 2 and 3 3. 1 and 3 4. 1 and 2

Show Answer Ans. 1, 2 and 4

2. The self-organizing maps can also be considered as the instance of _________ type of learning.

A. Supervised learning B. Unsupervised learning C. Missing data imputation D. Both A & C

Answer: B Explanation: The Self Organizing Map (SOM), or the Self Organizing Feature Map is a kind of Artificial Neural Network which is trained through unsupervised learning.

3. The following given statement can be considered as the examples of_________

Suppose one wants to predict the number of newborns according to the size of storks' population by performing supervised learning

A. Structural equation modeling B. Clustering C. Regression D. Classification

Answer: C

Explanation: The above-given statement can be considered as an example of regression. Therefore the correct answer is C.

4. In the example predicting the number of newborns, the final number of total newborns can be considered as the _________

A. Features B. Observation C. Attribute D. Outcome a. Answer: d

b. Explanation: In the example of predicting the total number of newborns, the result will be represented as the outcome. Therefore, the total number of newborns will be found in the outcome or addressed by the outcome.

5. Which of the following statement is true about the classification?

A. It is a measure of accuracy B. It is a subdivision of a set C. It is the task of assigning a classification D. None of the above

Answer: B

6. Which one of the following correctly refers to the task of the classification?

Answer: B

Explanation: The task of classification refers to dividing the set into subsets or in the numbers of the classes. Therefore the correct answer is C.

8. _______is a type of regression which models the non-linear dataset using a linear model.

a. Polynomial Regression b. Logistic Regression c. Linear Regression d. Decision Tree Regression

Ans. a

9. The prediction of the weight of a person when his height is known, is a simple example of regression. The function used in R language is_____.

a. Im() b. print() c. predict() d. summary( )

Ans. c

10. There is the following syntax of lm() function in multiple regression.

lm(y ~ x1+x2+x3...., data) a. y is predictor and x1,x2,x3 are the dependent variables. b. y is dependent and x1,x2,x3 are the predictors. c. data is predictor variable. d. None of the above.

Ans. b

11. _______is a probabilistic graphical model which represents a set of variables and their conditional dependencies using a directed acyclic graph.

a. A Bayesian network b. Bayes Network c. Bayesian Model

d. All of the above

Ans. d

12. In support vector regression, _____is a function used to map lower dimensional data into higher dimensional data

A) Boundary line B) Kernel C) Hyper Plane D) Support Vector Ans. B

Ans. b

14. The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a ____ or_____.

a. Directed Acyclic Graph or DAG b. Directed Cyclic Graph or DCG. c. Both the above. d. None of the above.

Ans. a

15. The hyperplane with maximum margin is called the ______ hyperplane. a. Non-optimal b. Optimal c. None of the above d. Requires one more option

Ans. b

16. One more _____ is needed for non-linear SVM.

a. Dimension b. Attribute c. Both the above d. None of the above

Ans. a

17. A subset of dataset to train the machine learning model, and we already know the output.

a. Training set b. Test set c. Both the above d. None of the above

Ans. a

a. Feature Sampling b. Feature Scaling c. None of the above d. Both the above

Ans. b

Ans. a

20. _____ units which are internal to the network and do not directly interact with the environment. a. Input

b. Output c. Hidden d. None of the above

Ans. c

Ans. b

25. Identify the component of a time series

a. Temporal b. Shares c. Trend d. Policymakers

Ans. c

Ans. b

28. The _______ perceptron consists of a set of input units connected by a single layer of weights to a set of output units a. Multi layer b. Single layer c. Hidden layer d. None of these

Ans. b

30. Patterns that repeat over a certain period of time a. Seasonal b. Trend c. None of the above d. Both of the above

Ans. a

31. Which of the following is characteristic of best machine learning method ?

a. Fast b. Accuracy c. Scalable d. All of the Mentioned

Ans. d

33. Supervised learning and unsupervised clustering both require at least one a. hidden attribute. b. output attribute. c. input attribute. d. categorical attribute. Ans. c

37. Machine learning techniques differ from statistical techniques in that machine learning methods a. typically assume an underlying distribution for the data. b. are better able to deal with missing and noisy data. c. are not able to explain their behavior. d. have trouble with large-sized datasets. Ans. b

38. This supervised learning technique can process both numeric and categorical input attributes. a. linear regression b. Bayes classifier c. logistic regression d. backpropagation learning Ans. b

39. This technique associates a conditional probability value with each data instance. a. linear regression b. logistic regression c. simple regression

d. multiple linear regression Ans. b

40. Logistic regression is a ________ regression technique that is used to model data having a _____outcome. a. linear, numeric b. linear, binary c. nonlinear, numeric d. nonlinear, binary Ans. d

Ans. d

42. Which of the following is true about Naive Bayes? a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent c. Both A and B d. None of the above options Ans. c 43.Simple regression assumes a __________ relationshipbetween the input attribute and output attribute. a. linear b. quadratic c. reciprocal d. inverse

44.With Bayes classifier, missing data items are a. treated as equal compares. b. treated as unequal compares.

c. replaced with a default value. d. ignored. 45. What is Machine learning? a. The autonomous acquisition of knowledge through the use of computer programs b. The autonomous acquisition of knowledge through the use of manual programs c. The selective acquisition of knowledge through the use of computer programs d. The selective acquisition of knowledge through the use of manual programs

Ans: a

46. Automated vehicle is an example of ______ a. Supervised learning b. Unsupervised learning c. Active learning d. Reinforcement learning

Ans: a

48. Neural networks a. optimize a convex cost function b. cannot be used for regression as well as classification c. always output values between 0 and 1 d. can be used in an ensemble

Ans: d

Ans: c

50. Which of the following is a disadvantage of decision trees?

a. Factor analysis b. Decision trees are robust to outliers c. Decision trees are prone to be overfit d. None of the above

Ans: c

Ans: b

52. Identify the following activation function :

φ(V) = Z + (1/ 1 + exp (– x * V + Y) ),Z, X, Y are parameters

a. Step function b. Ramp function c. Sigmoid function d. Gaussian function

Ans: c

Ans: d

54. With Bayes classifier, missing data items are a. treated as equal compares. b. treated as unequal compares. c. replaced with a default value. d. ignored.

Ans:b

Ans: b

56. Which of the following is true about Naive Bayes?

a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent c. Both a and b d. None of the above options

Ans: c

57. How many terms are required for building a Bayes model? a. 1 b. 2 c. 3 d. 4

Ans: c

58. What does the Bayesian network provides? a. Complete description of the domain b. Partial description of the domain c. Complete description of the problem d. None of the mentioned

Ans: a

59. How the Bayesian network can be used to answer any query? a. Full distribution b. Joint distribution c. Partial distribution d. All of the mentioned

Ans: b

60. In which of the following learning the teacher returns reward and punishment to learner? a. Active learning b. Reinforcement learning c. Supervised learning d. Unsupervised learning

Ans: b

61. Which of the following is the model used for learning? a. Decision trees b. Neural networks c. Propositional and FOL rules d. All of the mentioned

Ans: d

UNIT - 3

Answer: a

2. Which of the following is an essential process in which the intelligent methods are applied to extract data patterns? a. Warehousing b. Data Mining c. Text Mining d. Data Selection

Answer: b

Explanation: Data mining is a type of process in which several intelligent methods are used to extract meaningful data from the huge collection (or set) of data.

3. What are the functions of Data Mining? a. Association and correctional analysis classification b. Prediction and characterization c. Cluster analysis and Evolution analysis d. All of the above

Answer: d

4. Which attribute is _not_ indicative for data streaming?

a. Limited amount of memory b. Limited amount of processing time c. Limited amount of input data d. Limited amount of processing power

Ans. c

5. Which of the following statements about data streaming is true?

a. Stream data is always unstructured data. b. Stream data often has a high velocity. c. Stream elements cannot be stored on disk. d. Stream data is always structured data.

Ans. b

Ans. a

7. Which of the following statements about sampling are correct? a. Sampling reduces the diversity of the data stream b. Sampling increases the amount of data fed to a data mining algorithm c. Sampling algorithms often need multiple passes over the data d. Sampling aims to keep statistical properties of the data intact

Ans. d

8. What is the main difference between standard reservoir sampling and min-wise sampling?

Ans. c

9. A Bloom filter guarantees no

a. false positives b. false negatives

c. false positives and false negatives d. false positives or false negatives, depending on the Bloom filter type

Ans. b

10. Which of the following statements about standard Bloom filters is correct?

Ans. d

11. The FM-sketch algorithm uses the number of zeros the binary hash value ends in to make an estimation. Which of the following statements is true about the hash tail?

a. Any specific bit pattern is equally suitable to be used as hash tail. b. Only bit patterns with more 0's than 1's are equally suitable to be used as hash tails. c. Only the bit patterns 0000000..00 (list of 0s) or 111111..11 (list of 1s) are suitable hash tails. d. Only the bit pattern 0000000..00 (list of 0s) is a suitable hash tail.

Ans. a

12. The FM-sketch algorithm can be used to:

Ans. a

Ans. b

14. Which of the following statements about the standard DGIM algorithm are false? a. DGIM operates on a time-based window

b. DGIM reduces memory consumption through a clever way of storing counts c. In DGIM, the size of a bucket is always a power of two d. The maximum number of buckets has to be chosen beforehand. Ans. d

16. What are DGIM’s maximum error boundaries?

Ans. b

17. Which algorithm should be used to approximate the number of distinct elements in a data stream?

a. Misra-Gries b. Alon-Matias-Szegedy c. DGIM d. None of the above

Ans. d

18. Which of the following statements about Bloom filters are correct?

Ans. d

19. Which of the following statements about Bloom filters are correct?

20. Which of the following streaming windows show valid bucket representations according to the DGIM rules?

a. 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 1 b. 1 0 1 1 1 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 1 c. 1 1 1 1 0 0 1 1 1 0 1 0 1 d. 1 0 1 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1

Ans. d

21. For which of the following streams is the second-order moment is greater than 45?

a. 10 5 5 10 10 10 1 1 1 10 b. 1 1 1 1 1 5 10 10 5 1 c. 10 10 10 10 10 5 5 5 5 5 d. None of above Ans. c

22. For which of the following streams is the second-order moment is greater than 50?

a. 10 5 5 10 10 10 1 1 1 10 b. 10 10 10 10 10 10 10 10 10 10 c. 10 10 10 10 10 5 5 5 5 5 d. None of above

Ans. b

23. Which of the following statements is correct about data mining?

Answer: d

24. The classification of the data mining system involves:

a. Database technology b. Information Science c. Machine learning d. All of the above

Answer: d

25. The issues like efficiency, scalability of data mining algorithms comes under_______

a. Performance issues b. Diverse data type issues

c. Mining methodology and user interaction d. All of the above

Answer: a

Explanation: In order to extract information effectively from a huge collection of data in databases, the data mining algorithm must be efficient and scalable. Therefore the correct answer is A.

26. In data streams, data is……..

a. continuous

b. discrete

c. scattered

d. none of above

Ans. a 27. In mining data stream data should be of…..

a. same type

b. different type

c. binary

d. none of above

Ans. b

28. Which one is not component of data stream management system?

a. Data stream

b. system processor

c. SQL engine

d. storage

Ans. c

29. Which of the following statement is true about mining data streams

a. Data rate is not controlled by the system

b. Data type is same for all data streams

c. Data is divided in chunks and later stored in database

d. none of above

Ans. a

30. Which of the following is the data stream source?

a. Sensors data

b. Web/traffic camera data

c. Image data

d. all of above

Ans. d

31. What are the different operations on stream?

a. Sampling

b. counting distinct elements

c. Filtering

d. All of above

Ans. d

32. Which one is not the data stream process?

a. Finding frequent item

b. sampling

c. Filtering

d. Counting distinct elements.

Ans. a

33. In Flajolet-Martin algorithm if the stream contains n elements with m of them unique, this algorithm runs in

a. O(n) time

b. constant time

c. O(2n) time

d. O(3n)time

Ans. a

34. which algorithm we will implement to know how many distinct users visited the website till now or in last 2 hours.

a. SVM

b. DGIM

c. FM

d. Clustering

Ans. c

35. In FM algorithm we shall use estimate...............for the number of distinct elements seen in the stream.

a. 2R

b. 3R

c. 2R

d None of above

Ans. a

36. In sliding window of size w an element arriving at time t expires at

a. w

b. t

c. t + w

d. t - w

Ans. c

37. Real-time data stream is _______

a. sequence of data items that arrive in some order and may be seen only once.

b. sequence of data items that arrive in some order and may be seen twice.

c. sequence of data items that arrive in same order

d. sequence of data items that arrive in different order

ans. a

38. Which of the following statements about standard Bloom filters is correct?

a. It is possible to delete an element from a Bloom filter.

b. A Bloom filter always returns the correct result.

C. It is possible to alter the hash functions of a full Bloom filter to create more space.

d. A Bloom filter always returns TRUE when testing for a previously added element.

Ans. d

39. What are DGIM’s maximum error boundaries?

a. DGIM always underestimates the true count; at most by 25%

b. DGIM either underestimates or overestimates the true count; at most by 50%

c. DGIM always overestimates the count; at most by 50%

d. DGIM either underestimates or overestimates the true count; at most by 25%

Ans. b

40. Which of the following statements about the standard DGIM algorithm are false?

a. DGIM operates on a time-based window.

b. DGIM reduces memory consumption through a clever way of storing counts.

c. In DGIM, the size of a bucket is always a power of two

d. The maximum number of buckets has to be chosen beforehand.

Ans. d

41. In DGIM,whenever forming a bucket then_____

a. Every bucket should have at least one 1, else no bucket can be formed

b. Every bucket should have at least two 1, else no bucket can be formed

c. Every bucket should have at least three 1, else no bucket can be formed

d. Every bucket should have at least four 1, else no bucket can be formed

Ans. a

42. Which attribute is not indicative for data streaming?

a. Limited amount of memory

b. Limited amount of processing time

c. Limited amount of input data

d. Limited amount of processing power

Ans. c

43. In Filtering Streams____________

a. Accept those tuples in the stream that meet a criterion.

b. Accept data in the stream that meet a criterion.

c. Accept those class in the stream that meet a criterion

d. Accept rows in the stream that meet a criterion.

Ans. a

44. A Bloom filter consists of_________

a. An array of n bits, initially all 0’s.

b. An array of 1 bits, initially all 0’s.

c. An array of 2 bits, initially all 0’s.

d. An array of n bits, initially all 1’s.

Ans. a

45. The purpose of the Bloom filter is to allow____________

a. through all stream elements whose keys are in Set

b. through all stream elements whose keys are in class

c. through all data elements whose keys are in Set

d. through all touple elements whose keys are in Set

Ans. a

46. The second order moment for the stream a, b, c, b, d, a, c, d, a, b, d, c, a, a, b is

a. 60

b. 59

c. 51

d. 71

Ans. b

47. The second order moment for the stream a, b, c, b, d, a, c, d, a, b, d, c, a, a, b using Alon-Matias-Szegedy Algorithm is

a. 59

b. 67

c. 55

d. 75

Ans. c

48. Which of the following stream clustering algorithm can be used for counting 1's in a stream

a. FM Algorithm

b. PCY Algorithm

c. BDMO Algorithm

d. SON Algorithm

Ans. c

49. The time between elements of one stream

a. need not be uniform

b. need to be uniform

c. must be 1ms.

d. must be 1ns

Ans. a

50. In Bloom filter an array of n bits is initialized with

a. all 0s

b. all 1s

c. half 0s and half 1s

d. all -1

Ans. a

51. the number of different elements appearing in a stream, using Flajolet Martin algorithm. The Given Stream is: 4, 2, 5, 9, 1, 6, 3, 7, are--------- . Hash function is h(x) = 3x + 1 mod 32

a. 12

b. 16

c. 8

d. 9

Ans. b

52. The number of different elements appearing in a stream, using Flajolet Martin algorithm. The Given Stream is: 4, 2, 5, 9, 1, 6, 3, 7, are--------- . Hash function is h(x) = x + 6 mod 32.

a. 8

b. 16

c. 12

d. 20

Ans. a

UNIT -4

1. Movie Recommendation systems are an example of 1. Classification 2. Clustering 3. Reinforcement Learning 4. Regression 1. 2 only 2. 1 and 3 3. 1 and 2 4. 2 and 3 Show Answer 1 and 3

2. In the following given diagram, which type of clustering is used?

a. Hierarchal b. Naive Bayes c. Partitional d. None of the above e. Answer: a

f. Explanation: In the above-given diagram, the hierarchal type of clustering is used. The hierarchal type of clustering categorizes data through a variety of scales by making a cluster tree. So the correct answer is A.

3. Which of the following statements is incorrect about the hierarchal clustering?

a. The hierarchal type of clustering is also known as the HCA b. The choice of an appropriate metric can influence the shape of the cluster c. In general, the splits and merges both are determined in a greedy manner d. All of the above

Answer: a

Explanation: All following statements given in the above question are incorrect, so the correct answer is D.

4. Which one of the following can be considered as the final output of the hierarchal type of clustering?

a. A tree which displays how the close thing are to each other b. Assignment of each point to clusters c. Finalize estimation of cluster centroids d. None of the above e. Answer: a

f. Explanation: The hierarchal type of clustering can be referred to as the agglomerative approach.

5. Which one of the following statements about the K-means clustering is incorrect?

d. All of the above e. Answer: c

f. Explanation: There is nothing to deal in between the k-means and the K- means the nearest neighbor.

6. Which of the following statements about hierarchal clustering is incorrect?

a. The hierarchal clustering can primarily be used for the aim of exploration b. The hierarchal clustering should not be primarily used for the aim of exploration c. Both A and B d. None of the above e. Answer: a

f. Explanation: The hierarchical clustering technique can be used for exploration because it is the deterministic technique of clustering.

7. Which one of the clustering technique needs the merging approach?

a. Partitioned b. Naïve Bayes

Hierarchical

c. Both A and C d. Answer: c

e. Explanation: The hierarchal type of clustering is one of the most commonly used methods to analyze social network data. In this type of clustering method, multiple nodes are compared with each other on the basis of their similarities and several larger groups' are formed by merging the nodes or groups of nodes that have similar characteristics.

8. Which one of the following correctly defines the term cluster?

a. Group of similar objects that differ significantly from other objects b. Symbolic representation of facts or ideas from which information can potentially be extracted c. Operations on a database to transform or simplify data in order to prepare it for a machine-learning algorithm d. All of the above e. Answer: a

f. Explanation: The term "cluster" refers to the set of similar objects or items that differ significantly from the other available objects. In other words, we can understand clusters as making groups of objects that contain similar characteristics form all available objects. Therefore the correct answer is A.

9. Hierarchical clustering should be mainly used for exploration.

a. True

b. False

c. May be true or false

d. None of the above

Answer: a

10. K-means clustering consists of a number of iterations and not deterministic.

a. True

b. False

c. May be true or false

d. None of the above

Answer: a

11. Which function is used for k-means clustering?

(A). k-means

(B). k-mean

(C). heatmap

(D). none of the mentioned

MCQ Answer: a

12. Which is needed by K-means clustering?

(A). defined distance metric

(B). number of clusters

(C). initial guess as to cluster centroids

(D). all of these

MCQ Answer: d

13. Which is conclusively produced by Hierarchical Clustering?

(A). final estimation of cluster centroids

(B). tree showing how nearby things are to each other

(C). assignment of each point to clusters

(D). all of these

MCQ Answer: b

14. Which clustering technique requires a merging approach?

(A). Partitional

(B). Hierarchical

(C). Naive Bayes

(D). None of the mentioned

MCQ Answer: b

15. Which of the following is finally produced by Hierarchical Clustering?

a) final estimate of cluster centroids

b) tree showing how close things are to each other

c) assignment of each point to clusters

d) all of the mentioned

Ans. b

16. Which of the following is required by K-means clustering?

a) defined distance metric

b) number of clusters

c) initial guess as to cluster centroids

d) all of the mentioned

Ans. d

17. Point out the wrong statement.

a) k-means clustering is a method of vector quantization

b) k-means clustering aims to partition n observations into k clusters

c) k-nearest neighbor is same as k-means

d) None of the mentioned

Ans. c

18. Which of the following combination is incorrect?

a) Continuous – euclidean distance

b) Continuous – correlation similarity

c) Binary – manhattan distance

d) None of the mentioned

Ans. d

19. Which of the following can act as possible termination conditions in K-Means?

1. For a fixed number of iterations.

2. Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum.

3. Centroids do not change between successive iterations.

4. Terminate when RSS falls below a threshold.

Options:

a. 1, 3 and 4

b. 1, 2 and 3

c. 1, 2 and 4

d. All of the above

Ans. d

20. A collection of one or more items is called as _____

(a)

Itemset

(b)

Support

(c)

Confidence

(d)

Support Count

Ans. a

21. Frequency of occurrence of an itemset is called as _____

(a)

Support

(b)

Confidence

(c)

Support Count

(d)

Rules

Ans. c

22. An itemset whose support is greater than or equal to a minimum support threshold is ______

(a)

Itemset

(b)

Frequent Itemset

(c)

Infrequent items

(d)

Threshold values

Ans. b

23. What techniques can be used to improve the efficiency of apriori algorithm?

(a)

Hash-based techniques

(b)

Transaction Increases

(c)

Sampling

(d)

Cleaning

Ans. a

24. What do you mean by support(A)?

(a)

Total number of transactions containing A

(b)

Total Number of transactions not containing A

(c)

Number of transactions containing A / Total number of transactions

(d)

Number of transactions not containing A / Total number of transactions

Ans. c

25. Which of the following is the direct application of frequent itemset mining?

(a)

Social Network Analysis

(b)

Market Basket Analysis

(c)

Outlier Detection

(d)

Intrusion Detection

Ans. b

26. When do you consider an association rule interesting?

(a)

If it only satisfies min_support

(b)

If it only satisfies min_confidence

(c)

If it satisfies both min_support and min_confidence

(d)

There are other measures to check so

Ans. c

27. What is the relation between a candidate and frequent itemsets?

(a)

A candidate itemset is always a frequent itemset

(b)

A frequent itemset must be a candidate itemset

(c)

No relation between these two

(d)

Strong relation with transactions

Ans. b

28. Which of the following is not a frequent pattern mining algorithm?

(a)

Apriori

(b)

FP growth

(c)

Decision trees

(d)

Eclat

Ans. c

29. Which algorithm requires fewer scans of data?

(a)

Apriori

(b)

FP Growth

(c)

Naive Bayes

(d)

Decision Trees

Ans. b

30. For the question given below consider the data Transactions :

I1, I2, I3, I4, I5, I6

I7, I2, I3, I4, I5, I6

I1, I8, I4, I5

I1, I9, I10, I4, I6

I10, I2, I4, I11, I5

With support as 0.6 find all frequent itemsets?

(a)

(b)

(c)

(d)

Ans. a

31. What will happen if support is reduced?

(a)

Number of frequent itemsets remains the same

(b)

Some itemsets will add to the current set of frequent itemsets

(c)

Some itemsets will become infrequent while others will become frequent

(d)

Can not say

Ans. b

32. What is association rule mining?

(a)

Same as frequent itemset mining

(b)

Finding of strong association rules using frequent itemsets

(c)

Using association to analyze correlation rules

(d)

Finding Itemsets for future trends

Ans. b

33. What does FP growth algorithm do?

a. It mines all frequent patterns through pruning rules with lesser support b. It mines all frequent patterns through pruning rules with higher support c. It mines all frequent patterns by constructing a FP tree d. All of these

Ans. c

34. Which technique finds the frequent itemsets in just two database scans?

a. Patitioning b. sampling c. hashing d. None of these

Ans. a

35. Which of the following is true?

a. Both apriori and FP-Growth uses horizontal data format b. Both apriori and FP-Growth uses vertical data format c. Both a and b d. None of these

Ans. a

36. What is the principle on which Apriori algorithm work?

a. If a rule is infrequent, its specialized rules are also infrequent b. If a rule is infrequent, its generalized rules are also infrequent c. Both a and b d. None of these

Ans. a

37. What are closed frequent itemsets?

a. A closed itemset b. A frequent itemset c. An itemset which is both closed and frequent d. None of these

Ans. c

38. What are maximal frequent itemsets?

A frequent item set whose no super-itemset is frequent

A frequent itemset whose super-itemset is also frequent

Both a and b

None of these

Ans. a

39. What is frequent pattern growth?

a. Same as frequent itemset mining b. Use of hashing to make discovery of frequent itemsets more efficient c. Mining of frequent itemsets without candidate generation d. None of these

Ans. c

40. When is sub-itemset pruning done?

a. A frequent itemset ‘P’ is a proper subset of another frequent itemset ‘Q’ b. Support (P) = Support(Q) c. When both a and b is true

d. When a is true and b is not

Ans. c

41. The apriori algorithm works in a ..and ..fashion?

a. top-down and depth-first b. top-down and breath-first c. bottom-up and depth-first d. bottom-up and breath-first

Ans. d

42. In association rule mining the generation of the frequent itermsets is the computational intensive step.

a. TRUE b. FALSE c. Both a and b d. None of these

Ans. a

43. The number of iterations in apriori __

Ans. c

44. Which of the following are interestingness measures for association rules?

a. recall b. lift c. accuracy d. compactness

Ans. b

45. In Apriori algorithm, if 1 item-sets are 100, then the number of candidate 2 item-sets are

a. 100 b. 4950 c. 200 d. 5000

Ans. b

46. Significant Bottleneck in the Apriori algorithm is

a. Finding frequent itemsets b. pruning c. Candidate generation d. Number of iterations

Ans. c

47. Which Association Rule would you prefer

a. High support and medium confidence b. High support and low confidence c. Low support and high confidence d. Low support and low confidence

Ans. c

48. The apriori property means

a. If a set cannot pass a test, its supersets will also fail the same test b. To decrease the efficiency, do level-wise generation of frequent item sets c. To improve the efficiency, do level-wise generation of frequent item sets d. If a set can pass a test, its supersets will fail the same test

Ans. a

49. If an item set ‘XYZ’ is a frequent item set, then all subsets of that frequent item set are

a. undefined b. not frequent c. frequent d. cant say

Ans. c

50. To determine association rules from frequent item sets

a. Only minimum confidence needed b. Neither support not confidence needed c. Both minimum support and confidence are needed d. Minimum support is needed

Ans. c

51. If {A,B,C,D} is a frequent itemset, candidate rules which is not possible is

a. C –> A b. D –> ABCD c. A –> BC d. B –> ADC

Ans. b

UNIT - 5

1. Who was the developer of Hadoop language?

A. Apache Software Foundation B. Hadoop Software Foundation C. Sun Microsystems D. Bell Labs

Ans : A

Explanation: Hadoop Developed by: Apache Software Foundation.

2. The Hadoop language written in which language?

A. C B. C++ C. Java D. Python

Ans : C

Explanation: The hadoop language Written in: Java.

3. What was the Initial release date of hadoop?

A. 1st April 2007 B. 1st April 2006 C. 1st April 2008 D. 1st April 2005

Ans : B

Explanation: Initial release: April 1, 2006; 13 years ago.

4. What license is Hadoop distributed under?

A. Apache License 2.1 B. Apache License 2.2 C. Apache License 2.0 D. Apache License 1.0

Ans : C

Explanation: Hadoop is Open Source, released under Apache 2 license.

5. IBM and ________ have announced a major initiative to use Hadoop to support university courses in distributed computer programming.

A. Google B. Apple C. Facebook D. Microsoft

Ans : A

Explanation: Google and IBM Announce University Initiative to Address Internet-Scale.

6. On which platfrm hadoop langauge runs?

A. Bare metal B. Debian C. Cross-platform D. Unix-Like

Ans : C

Explanation: Hadoop has support for cross platform operating system.

7. Which of the following is not Features Of Hadoop?

A. Suitable for Big Data Analysis B. Scalability C. Robust D. Fault Tolerance

Ans : C

Explanation: Robust is is not Features Of Hadoop.

8. The MapReduce algorithm contains two important tasks, namely __________.

A. mapped, reduce B. mapping, Reduction C. Map, Reduction D. Map, Reduce

Ans : D

Explanation: The MapReduce algorithm contains two important tasks, namely Map and Reduce.

9. _____ takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs).

A. Map B. Reduce C. Both A and B D. Node

Ans : A

Explanation: Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs).

10 ______ task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples.

A. Map B. Reduce C. Node D. Both A and B

Ans : B

Explanation: Reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples.

11. In how many stages the MapReduce program executes?

A. 2 B. 3 C. 4 D. 5

Ans : B

Explanation: MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage.

12. Which of the following is used to schedules jobs and tracks the assign jobs to Task tracker?

A. SlaveNode B. MasterNode C. JobTracker D. Task Tracker

Ans : C

Explanation: JobTracker : Schedules jobs and tracks the assign jobs to Task tracker.

13. Which of the following is used for an execution of a Mapper or a Reducer on a slice of data?

A. Task B. Job C. Mapper D. PayLoad

Ans : A

Explanation: Task : An execution of a Mapper or a Reducer on a slice of data.

14. Which of the following commnd runs a DFS admin client?

A. secondaryadminnode B. nameadmin C. dfsadmin D. adminsck

Ans : C

Explanation: dfsadmin : Runs a DFS admin client.

15. Point out the correct statement.

Ans : A

Explanation: This feature of MapReduce is "Data Locality".

16. Although the Hadoop framework is implemented in Java, MapReduce applications need not be written in ____________

A. C B. C# C. Java D. None of the above

Ans : C

Explanation: Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications (non JNITM based).

17. The number of maps is usually driven by the total size of ____________

A. Inputs B. Output C. Task D. None of the above

Ans : A

Explanation: Total size of inputs means the total number of blocks of the input files.

18. What is full form of HDFS?

A. Hadoop File System B. Hadoop Field System C. Hadoop File Search D. Hadoop Field search

Ans : A

Explanation: Hadoop File System was developed using distributed file system design.

19. HDFS works in a __________ fashion.

A. worker-master fashion B. master-slave fashion C. master-worker fashion D. slave-master fashion

Ans : B

Explanation: HDFS follows the master-slave architecture.

20. Which of the following are the Goals of HDFS?

A. Fault detection and recovery B. Huge datasets C. Hardware at data D. All of the above

Ans : D

Explanation: All the above option are the goals of HDFS.

21. ________ NameNode is used when the Primary NameNode goes down.

A. Rack B. Data C. Secondary D. Both A and B

Ans : C

Explanation: Secondary namenode is used for all time availability and reliability.

22. The minimum amount of data that HDFS can read or write is called a _____________.

A. Datanode B. Namenode C. Block D. None of the above

Ans : C

Explanation: The minimum amount of data that HDFS can read or write is called a Block.

23. The default block size is ______.

A. 32MB B. 64MB C. 128MB D. 16MB

Ans : B

Explanation: The default block size is 64MB, but it can be increased as per the need to change in HDFS configuration.

24. For every node (Commodity hardware/System) in a cluster, there will be a _________.

A. Datanode B. Namenode C. Block D. None of the above

Ans : A

Explanation: For every node (Commodity hardware/System) in a cluster, there will be a datanode.

25. Which of the following is not Features Of HDFS?

Ans : D

Explanation: The correct feature is Hadoop provides a command interface to interact with HDFS.

26. HDFS is implemented in _____________ language.

A. Perl B. Python C. Java D. C

Ans : C

Explanation: HDFS is implemented in Java and any computer which can run Java can host a NameNode/DataNode on it.

27. During start up, the ___________ loads the file system state from the fsimage and the edits log file.

A. Datanode B. Namenode C. Block D. ActionNode

Ans : B

Explanation: HDFS is implemented on any computer which can run Java can host a NameNode/DataNode on it.

28. Which of the following is not true about Pig?

A. Apache Pig is an abstraction over MapReduce B. Pig cannot perform all the data manipulation operations in Hadoop. C. Pig is a tool/platform which is used to analyze larger sets of data representing them as data flows. D. None of the above

Ans : B

Explanation: Pig is generally used with Hadoop; we can perform all the data manipulation operations in Hadoop using Apache Pig.

29. Which of the following is/are a feature of Pig?

A. Rich set of operators B. Ease of programming C. Extensibility D. All of the above

Ans : D

Explanation: All options are the following Features of Pig.

30. In which year apache Pig was released?

A. 2005 B. 2006

C. 2007 D. 2008

Ans : B

Explanation: In 2006, Apache Pig was developed as a research project.

31. Pig operates in mainly how many nodes?

A. 2 B. 3 C. 4 D. 5

Ans : A

Explanation: You can run Pig (execute Pig Latin statements and Pig commands) using various mode: Interactive and Batch Mode.

32. Which of the following company has developed PIG?

A. Google B. Yahoo C. Microsoft D. Apple

Ans : B

Explanation: Apache Pig was developed as a research project at Yahoo, especially to create and execute MapReduce jobs on every dataset.

33. Which of the following function is used to read data in PIG?

A. Write B. Read C. Perform D. Load

Ans : D

Explanation: PigStorage is the default load function.

34. __________ is a framework for collecting and storing script-level statistics for Pig Latin.

A. Pig Stats B. PStatistics C. Pig Statistics D. All of the above

Ans : C

Explanation: The new Pig statistics and the existing Hadoop statistics can also be accessed via the Hadoop job history file.

35. Which of the following is true statement?

A. Pig is a high level language. B. Performing a Join operation in Apache Pig is pretty simple.

C. Apache Pig is a data flow language. D. All of the above

Ans : D

Explanation: All option are true statement.

36. Which of the following will compile the Pigunit?

A. $pig_trunk ant pigunit-jar B. $pig_tr ant pigunit-jar C. $pig_ ant pigunit-jar D. $pigtr_ ant pigunit-jar

Ans : A

Explanation: The compile will create the pigunit.jar file.

37. Point out the wrong statement.

Ans : A

Explanation: Through the User Defined Functions(UDF) facility in Pig, Pig can invoke code in many languages like JRuby, Jython and Java.

38. Which of the following is/are INCORRECT with respect to Hive?

Ans : B

Explanation: Hive needs a relational database like oracle to perform query operations and store data is incorrect with respect to Hive.

39. Which of the following is not a Features of HiveQL?

A. Supports joins B. Supports indexes C. Support views D. Support Transactions

Ans : D

Explanation: Support Transactions is not a Features of HiveQL.

40. Which of the following operator executes a shell command from the Hive shell?

A. | B. !

C. # D. $

Ans : B

Explanation: Exclamation operator is for execution of command.

41. Hive uses _________ for logging.

A. logj4 B. log4l C. log4i D. log4j

Ans : D

Explanation: By default Hive will use hive-log4j.default in the conf/ directory of the Hive installation.

42. HCatalog is installed with Hive, starting with Hive release is ___________

A. 0.10.0 B. 0.9.0 C. 0.11.0 D. 0.12.0

Ans : C

Explanation: hcat commands can be issued as hive commands, and vice versa.

43. _______ supports a new command shell Beeline that works with HiveServer2.

A. HiveServer2 B. HiveServer3 C. HiveServer4 D. HiveServer5

Ans : A

Explanation: The Beeline shell works in both embedded mode as well as remote mode.

44. The ________ allows users to read or write Avro data as Hive tables.

A. AvroSerde B. HiveSerde C. SqlSerde D. HiveQLSerde

Ans : A

Explanation: AvroSerde understands compressed Avro files.

45. Which of the following data type is supported by Hive?

A. map B. record C. string D. enum

Ans : D

Explanation: Hive has no concept of enums.

46. We need to store skill set of MCQs(which might have multiple values) in MCQs table, which of the following is the best way to store this information in case of Hive?

Ans : C

Explanation: Option C is correct.

47. Letsfindcourse is generating huge amount of data. They are generating huge amount of sensor data from different courses which was unstructured in form. They moved to Hadoop framework for storing and analyzing data. What technology in Hadoop framework, they can use to analyse this unstructured data?

A. MapReduce programming B. Hive C. RDBMS D. None of the above Ans : A

Explanation: MapReduce programming is the right answer.

48. Which of the following is correct statement?

A. HBase is a distributed column-oriented database B. Hbase is not open source C. Hbase is horizontally scalable. D. Both A and C

Ans : D

Explanation: HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an open-source project and is horizontally scalable.

49. Which of the following is not a feature of Hbase?

A. HBase is lateral scalable. B. It has automatic failure support. C. It provides consistent read and writes. D. It has easy java API for client.

Ans : A

Explanation: Option A is incorrect because HBase is linearly scalable.

50. When did HBase was first released?

A. April 2007 B. March 2007 C. February 2007 D. May 2007 Ans : C

Explanation: HBase was first released in February 2007. Later in January 2008, HBase became a sub project of Apache Hadoop.

51. Apache HBase is a non-relational database modeled after Google's _________

A. BigTop B. Bigtable C. Scanner D. FoundationDB

Ans : B

Explanation: Bigtable acts up on Google File System, likewise Apache HBase works on top of Hadoop and HDFS.

52. HBaseAdmin and ____________ are the two important classes in this package that provide DDL functionalities.

A. HTableDescriptor B. HDescriptor C. HTable D. HTabDescriptor

Ans : A

Explanation: Java provides an Admin API to achieve DDL functionalities through programming

53. which of the following is correct statement?

A. HBase provides fast lookups for larger tables. B. It provides low latency access to single rows from billions of records C. HBase is a database built on top of the HDFS. D. All of the above

Ans : D

Explanation: All the options are correct.

54. HBase supports a ____________ interface via Put and Result.

A. bytes-in/bytes-out B. bytes-in C. bytes-out D. None of the above

Ans : A

Explanation: Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.

55. Which command is used to disable all the tables matching the given regex?

A. remove all B. drop all C. disable_all D. None of the above

Ans : C

Explanation: The syntax for disable_all command is as follows : hbase > disable_all 'r.*'

56. _________ is the main configuration file of HBase.

A. hbase.xml B. hbase-site.xml C. hbase-site-conf.xml D. hbase-conf.xml

Ans : B

Explanation: Set the data directory to an appropriate location by opening the HBase home folder in /usr/local/HBase.

57. which of the following is incorrect statement?

A. HBase is built for wide tables B. Transactions are there in HBase. C. HBase has de-normalized data. D. HBase is good for semi-structured as well as structured data.

Ans : B

Explanation: No transactions are there in HBase.

58. R was created by?

A. Ross Ihaka B. Robert Gentleman C. Both A and B D. Ross Gentleman

Ans : C

Explanation: R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team.

59. R allows integration with the procedures written in the?

A. C B. Ruby C. Java D. Basic

Ans : A

Explanation: R allows integration with the procedures written in the C, C++, .Net, Python or FORTRAN languages for efficiency.

60. R is free software distributed under a GNU-style copy left, and an official part of the GNU project called?

A. GNU A B. GNU S C. GNU L D. GNU R

Ans : B

Explanation: R is free software distributed under a GNU-style copy left, and an official part of the GNU project called GNU S.

61. R made its first appearance in?

A. 1992 B. 1995 C. 1993 D. 1994

Ans : C

Explanation: R made its first appearance in 1993.

62. Which of the following is true about R?

Ans : D

Explanation: All of the above statement are true.

63. Point out the wrong statement?

Ans : B

Explanation: help command is used for knowing details of particular command in R.

64. Command lines entered at the console are limited to about ________ bytes

A. 4095 B. 4096 C. 4097 D. 4098

Ans : A

Explanation: Elementary commands can be grouped together into one compound expression by braces (‘{’ and ‘}’).

65. R language is a dialect of which of the following languages?

A. s B. c C. sas D. matlab

Ans : A

66. How many atomic vector types does R have?

A. 3 B. 4 C. 5 D. 6

Ans : D

67. R files has an extension _____.

A. .S B. .RP C. .R D. .SP

Ans : C

Explanation: All R files have an extension .R. R provides a mechanism for recalling and re-executing previous commands. All S programmed files will have an extension .S. But R has many functions than S.

68. What will be output for the following code?

v <- TRUE

print(class(v))

A. logical B. Numeric C. Integer D. Complex

Ans : A

Explanation: It produces the following result : [1] ""logical""

69. What will be output for the following code?

v <- ""TRUE""

print(class(v))

A. logical B. Numeric C. Integer D. Character

Ans : D

Explanation: It produces the following result : [1] ""character""

70. In R programming, the very basic data types are the R-objects called?

A. Lists B. Matrices C. Vectors D. Arrays

Ans : C

Explanation: In R programming, the very basic data types are the R-objects called vectors

71. Data Frames are created using the?

A. frame() function B. data.frame() function C. data() function D. frame.data() function

Ans : B

Explanation: Data Frames are created using the data.frame() function

72. Which functions gives the count of levels?

A. level B. levels C. nlevels D. nlevel

Ans : C

Explanation: Factors are created using the factor() function. The nlevels functions gives the count of levels.

73. Point out the correct statement?

Ans : A

Explanation: A vector can only contain objects of the same class.

74. What will be the output of the following R code?

> x <- vector(""numeric"", length = 10)

> x

A. 1 0 B. 0 0 0 0 0 0 0 0 0 0 C. 0 1 D. 0 0 1 1 0 1 1 0

Ans : B

Explanation: You can also use the vector() function to initialize vectors.

75. What will be output for the following code?

> sqrt(-17)

A. -4.02 B. 4.02 C. 3.67 D. NAN

Ans : D

Explanation: These metadata can be very useful in that they help to describe the object.

76. _______ function returns a vector of the same size as x with the elements arranged in increasing order.

A. sort() B. orderasc() C. orderby() D. sequence()

Ans : A

Explanation: There are other more flexible sorting facilities available like order() or sort.list() which produce a permutation to do the sorting.

77. What will be the output of the following R code?

> m <- matrix(nrow = 2, ncol = 3)

> dim(m)

A. 3 3 B. 3 2 C. 2 3 D. 2 2

Ans : C

Explanation: Matrices are constructed column-wise.

78. Which loop executes a sequence of statements multiple times and abbreviates the code that manages the loop variable?

A. for B. while C. do-while D. repeat

Ans : D

Explanation: repeat loop : Executes a sequence of statements multiple times and abbreviates the code that manages the loop variable.

79. Which of the following true about for loop?

Ans : B

Explanation: for loop : Like a while statement, except that it tests the condition at the end of the loop body.

80. Which statement simulates the behavior of R switch?

A. Next B. Previous C. break D. goto

Ans : A

Explanation: The next statement simulates the behavior of R switch.

81. In which statement terminates the loop statement and transfers execution to the statement immediately following the loop?

A. goto B. switch C. break D. label

Ans : C

Explanation: Break : Terminates the loop statement and transfers execution to the statement immediately following the loop.

82. Point out the wrong statement?

Ans : C

Explanation: lapply() always returns a list, regardless of the class of the input.

83. The mapply() function is a multivariate apply of sorts which applies a function in parallel over a set of arguments.

A. TRUE B. FALSE C. Can be true or false D. Can not say

Ans : A

Explanation: True, The mapply() function is a multivariate apply of sorts which applies a function in parallel over a set of arguments.

84 Which of the following is valid body of split function?

A. function (x, f) B. function (x, f, drop = FALSE, …) C. function (x, drop = FALSE, …) D. function (drop = FALSE, …)

Ans : B

Explanation: x is a vector (or list) or data frame

85. Which of the following character skip during execution?

v <- LETTERS[1:6]

for ( i in v) {

if (i == ""D"") {

}

print(i)

}

A. A B. B C. C D. D

Ans : D

Explanation: When the above code is compiled and executed, it produces the following result : [1] ""A"" [1] ""B"" [1] ""C"" [1] ""E"" [1] ""F""

86. What will be output for the following code?

v <- LETTERS[1]

for ( i in v) {

print(v)

}

A. A B. A B C. A B C D. A B C D

Ans : A

Explanation: The output for the following code : [1] ""A""

87. What will be output for the following code?

v <- LETTERS[""A""]

for ( i in v) {

print(v)

}

A. A B. NAN C. NA D. Error

Ans : C

Explanation: The output for the following code : [1] NA

88. An R function is created by using the keyword?

A. fun B. function C. declare D. extends

Ans : B

Explanation: An R function is created by using the keyword function.

89. What will be output for the following code?

print(mean(25:82))

A. 1526 B. 53.5 C. 50.5 D. 55

Ans : B

Explanation: The code will find mean of numbers from 25 to 82 that is 53.5

90. Point out the wrong statement?

A. Functions in R are “second class objects” B. The writing of a function allows a developer to create an interface to the code, that is explicitly specified with a set of parameters

C. Functions provides an abstraction of the code to potential users D. Writing functions is a core activity of an R programmer

Ans : A

Explanation: Functions in R are “first class objects”, which means that they can be treated much like any other R object.

91. What will be output for the following code?

> paste("a", "b", se = ":")

A. a+b B. a:b C. a-b D. None of the above

Ans : D

Explanation: With the paste() function, the arguments sep and collapse must be named explicitly and in full if the default values are not going to be used.

92. Which function in R language is used to find out whether the means of 2 groups are equal to each other or not?

A. f.tests () B. l.tests () C. t.tests () D. p.tests () View Answer Ans : C

Explanation: t.tests () function in R language is used to find out whether the means of 2 groups are equal to each other. It is not used most commonly in R. It is used in some specific conditions.

93. What will be the output of log (-5.8) when executed on R console?

A. NA B. NAN C. 0.213 D. Error View Answer Ans : B

94. Which function is preferred over sapply as vapply allows the programmer to specific the output type?

A. Lapply B. Japply C. Vapply D. Zapply View Answer Ans : C

95. How will you check if an element is present in a vector?

A. Match() B. Dismatch() C. Mismatch() D. Search()

Ans : A

96. You can check to see whether an R object is NULL with the _________ function.

A. is.null() B. is.nullobj() C. null() D. as.nullobj()

Ans : A

Explanation: It is sometimes useful to allow an argument to take the NULL value, which might indicate that the function should take some specific action.

97. In the base graphics system, which function is used to add elements to a plot?

A. Boxplot() B. Text() C. Treat() D. Both A and B

Ans : D

Explanation: In the base graphics system, boxplot or text function is used to add elements to a plot.

98. Which of the following syntax is used to install forecast package?

A. install.pack("forecast") B. install.packages("cast") C. install.packages("forecast") D. install.pack("forecastcast")

Ans : C

Explanation: forecast is used for time series analysis

99. Which splits a data frame and returns a data frame?

A. apply B. ddply C. stats D. plyr

Ans : B

Explanation: ddply splits a data frame and returns a data frame.

100. Which of the following is an R package for the exploratory analysis of genetic and genomic data?

A. adeg B. adegenet C. anc D. abd

Ans : B

Explanation: This package contains Classes and functions for genetic data analysis within the multivariate framework.

101. Which of the following contains functions for processing uniaxial minute-to-minute accelerometer data?

A. accelerometry B. abc C. abd D. anc

Ans : A

Explanation: This package contains a collection of functions that perform operations on time-series accelerometer data, such as identify non-wear time, flag minutes that are part of an activity about, and find the maximum 10-minute average count value.

102. ______ Uses Grieg-Smith method on 2 dimensional spatial data.

A. G.A. B. G2db C. G.S. D. G1DBN

Ans : C

103. Which of the following package provide namespace management functions not yet present in base R?

A. stringr B. nbpMatching C. messagewarning D. namespace

Ans : D

Explanation: The package namespace is one of the most confusing parts of building a package. nbpMatching contains functions for non-bipartite optimal matching.

104. What will be the output of the following R code?

install.packages(c("devtools", "roxygen2"))

A. Develops the tools B. Installs the given packages C. Exits R studio D. Nothing happens

Ans : B

105. A bundled package is a package that’s been compressed into a ______ file.

A. Double B. Single C. Triple D. No File

Ans : B

Explanation: A bundled package is a package that’s been compressed into a single file. A source package is just a directory with components like R/, DESCRIPTION, and so on.

106. .library() is not useful when developing a package since you have to install the package first.

A. TRUE B. FALSE C. Can be true or false D. Can not say

Ans : A

Explanation: library() is not useful when developing a package since you have to install the package first. A library is a simple directory containing installed packages.

107. DESCRIPTION uses a very simple file format called DCF.

A. TRUE B. FALSE C. Can be true or false D. Can not say

Ans : A

108. HDFS Stores how much data in each clusters that can be scaled at any time? 1. 32 2. 64 3. 128 4. 256 Show Answer 128

109. _____ provides performance through distribution of data and fault tolerance through replication 1. HDFS 2. PIG 3. HIVE 4. HADOOP Show Answer HDFS

110. ______ is a programming model for writing applications that can process Big Data in parallel on multiple nodes. 1. HDFS 2. MAP REDUCE 3. HADOOP 4. HIVE Show Answer MAP REDUCE

111. ____ takes the grouped key-value paired data as input and runs a Reducer function on each one of them. 1. MAPPER 2. REDUCER 3. COMBINER 4. PARTITIONER Show Answer REDUCER

112. ____ is a type of local Reducer that groups similar data from the map phase into identifiable sets. 1. MAPPER 2. REDUCER 3. COMBINER 4. PARTITIONER Show Answer COMBINER

113. While Installing Hadoop how many xml files are edited and list them ? 1. core-site.xml 2. hdfs-site.xml 3. mapred.xml 4. yarn.xml Show Answer core-site.xml

**********Module - 1 (Introduction)**********

1.According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big d ata technologies like Hadoop? (A). Big data management and data mining (B). Data warehousing and business intelligence (C). Management of Hadoop clusters (D). Collecting and storing unstructured data Answer -A

2.What are the main components of Big Data? (A). MapReduce (B). HDFS (C). YARN (D). All of these Answer -D

3.What are the different features of Big Data Analytics? (A). Open-Source (B). Scalability (C). Data Recovery (D). All the above Answer -D

4.According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big d ata technologies like Hadoop? (A). Big data management and data mining (B). Data warehousing and business intelligence (C). Management of Hadoop clusters (D). Collecting and storing unstructured data Answer -A

5.What are the four V’s of Big Data? (A). Volume (B). Velocity (C). Variety (D). All the above Answer -D

6.IBM and ________ have announced a major initiative to use Hadoop to support university courses in distributed c omputer programming. (A). Google Latitude (B). Android (operating system) (C). Google Variations (D). Google Answer: d Explanation: Google and IBM Announce University Initiative to Address Internet-Scale.

7.Point out the correct statement. (A). Hadoop is an ideal environment for extracting and transforming small volumes of data (B). Hadoop stores data in HDFS and supports data compression/decompression (C). The Giraph framework is less useful than a MapReduce job to solve graph and machine learning (D). None of the mentioned

Answer: b Explanation: Data compression can be achieved using compression algorithms like bzip2, gzip, LZO, etc. Different algorithms can be used in different scenarios based on their capabilities.

8.What license is Hadoop distributed under? (A). Apache License 2.0 (B). Mozilla Public License (C). Shareware (D). Commercial Answer: a Explanation: Hadoop is Open Source, released under Apache 2 license.

9.Sun also has the Hadoop Live CD ________ project, which allows running a fully functional Hadoop cluster using a live CD. (A). OpenOffice.org (B). OpenSolaris (C). GNU (D). Linux Answer: b Explanation: The OpenSolaris Hadoop LiveCD project built a bootable CD-ROM image.

10.Which of the following genres does Hadoop produce? (A). Distributed file system (B). JAX-RS (C). Java Message Service (D). Relational Database Management System Answer: a Explanation: The Hadoop Distributed File System (HDFS). is designed to store very large data sets reliably, and to s tream those data sets at high bandwidth to the user.

11.What was Hadoop written in? (A). Java (software platform) (B). Perl (C). Java (programming language) (D). Lua (programming language) Answer: c Explanation: The Hadoop framework itself is mostly written in the Java programming language, with some native co de in C and command-line utilities written as shell-scripts.

12.Which of the following platforms does Hadoop run on? (A). Bare metal (B). Debian (C). Cross-platform (D). Unix-like Answer: c Explanation: Hadoop has support for cross-platform operating system.

13.Hadoop achieves reliability by replicating the data across multiple hosts and hence does not require ________ sto rage on hosts. (A). RAID (B). Standard RAID levels (C). ZFS (D). Operating system

Answer: a Explanation: With the default replication value, 3, data is stored on three nodes: two on the same rack, and one on a different rack.

14.Above the file systems comes the ________ engine, which consists of one Job Tracker, to which client applicatio ns submit MapReduce jobs. (A). MapReduce (B). Google (C). Functional programming (D). Facebook Answer: a Explanation: MapReduce engine uses to distribute work around a cluster.

15.The Hadoop list includes the HBase database, the Apache Mahout ________ system, and matrix operations. (A). Machine learning (B). Pattern recognition (C). Statistical classification (D). Artificial intelligence Answer: a Explanation: The Apache Mahout project’s goal is to build a scalable machine learning tool.

16.As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, incl uding _______________ (A). Improved data storage and information retrieval (B). Improved extract, transform and load features for data integration (C). Improved data warehousing functionality (D). Improved security, workload management, and SQL support Answer: d Explanation: Adding security to Hadoop is challenging because all the interactions do not follow the classic client-se rver pattern.

17.Point out the correct statement. (A). Hadoop do need specialized hardware to process the data (B). Hadoop 2.0 allows live stream processing of real-time data (C). In Hadoop programming framework output files are divided into lines or records (D). None of the mentioned Answer: b Explanation: Hadoop batch processes data distributed over a number of computers ranging in 100s and 1000s.

18.According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop? (A). Big data management and data mining (B). Data warehousing and business intelligence (C). Management of Hadoop clusters (D). Collecting and storing unstructured data Answer: a Explanation: Data warehousing integrated with Hadoop would give a better understanding of data.

19.Hadoop is a framework that works with a variety of related tools. Common cohorts include ____________ (A). MapReduce, Hive and HBase (B). MapReduce, MySQL and Google Apps (C). MapReduce, Hummer and Iguana (D). MapReduce, Heron and Trumpet Answer: a

Explanation: To use Hive with HBase you’ll typically want to launch two clusters, one to run HBase and the other to run Hive.

20.Point out the wrong statement. (A). Hardtop processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabyt es of data (B). Hadoop uses a programming model called “MapReduce”, all the programs should confirm to this model in orde r to work on Hadoop platform (C). The programming model, MapReduce, used by Hadoop is difficult to write and test (D). All of the mentioned Answer: c Explanation: The programming model, MapReduce, used by Hadoop is simple to write and test.

21.What was Hadoop named after? (A). Creator Doug Cutting’s favorite circus act (B). Cutting’s high school rock band (C). The toy elephant of Cutting’s son (D). A sound Cutting’s laptop made during Hadoop development Answer: c Explanation: Doug Cutting, Hadoop creator, named the framework after his child’s stuffed toy elephant.

22.All of the following accurately describe Hadoop, EXCEPT ____________ (A). Open-source (B). Real-time (C). Java-based (D). Distributed computing approach Answer: b Explanation: Apache Hadoop is an open-source software framework for distributed storage and distributed processin g of Big Data on clusters of commodity hardware.

23.__________ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data. (A). MapReduce (B). Mahout (C). Oozie (D). All of the mentioned Answer: a Explanation: MapReduce is a programming model and an associated implementation for processing and generating l arge data sets with a parallel, distributed algorithm.

24.__________ has the world’s largest Hadoop cluster. (A). Apple (B). Datamatics (C). Facebook (D). None of the mentioned Answer: c Explanation: Facebook has many Hadoop clusters, the largest among them is the one that is used for Data warehousi ng.

25.Facebook Tackles Big Data With _______ based on Hadoop. (A). ‘Project Prism’ (B). ‘Prism’ (C). ‘Project Big’ (D). ‘Project Data’

Answer: a Explanation: Prism automatically replicates and moves data wherever it’s needed across a vast network of computin g facilities.

26.________ is a platform for constructing data flows for extract, transform, and load (ETL). processing and analysi s of large datasets. (A). Pig Latin (B). Oozie (C). Pig (D). Hive Answer: c Explanation: Apache Pig is a platform for analyzing large data sets that consists of a high-level language for express ing data analysis programs.

27.Point out the correct statement. (A). Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data (B). Hive is a relational database with SQL support (C). Pig is a relational database with SQL support (D). All of the mentioned Answer: a Explanation: Hive is a SQL-based data warehouse system for Hadoop that facilitates data summarization, ad hoc que ries, and the analysis of large datasets stored in Hadoop-compatible file systems.

28._________ hides the limitations of Java behind a powerful and concise Clojure API for Cascading. (A). Scalding (B). HCatalog (C). Cascalog (D). All of the mentioned Answer: c Explanation: Cascalog also adds Logic Programming concepts inspired by Datalog. Hence the name “Cascalog” is a contraction of Cascading and Datalog.

29.Hive also support custom extensions written in ____________ (A). C# (B). Java (C). C (D). C++ Answer: b Explanation: Hive also support custom extensions written in Java, including user-defined functions (UDFs). and seri alize r-deserializers for reading and optionally writing custom formats.

30.Point out the wrong statement. (A). Elastic MapReduce (EMR). is Facebook’s packaged Hadoop offering (B). Amazon Web Service Elastic MapReduce (EMR). is Amazon’s packaged Hadoop offering (C). Scalding is a Scala API on top of Cascading that removes most Java boilerplate (D). All of the mentioned Answer: a Explanation: Rather than building Hadoop deployments manually on EC2 (Elastic Compute Cloud). clusters, users c an spin up fully configured Hadoop installations using simple invocation commands, either through the AWS Web Console or through command-line tools.

31.________ is the most popular high-level Java API in Hadoop Ecosystem (A). Scalding

(B). HCatalog (C). Cascalog (D). Cascading Answer: d Explanation: Cascading hides many of the complexities of MapReduce programming behind more intuitive pipes an d data flow abstractions.

32.___________ is general-purpose computing model and runtime system for distributed data analytics. (A). Mapreduce (B). Drill (C). Oozie (D). None of the mentioned Answer: a Explanation: Mapreduce provides a flexible and scalable foundation for analytics, from traditional reporting to leadi ng-edge machine learning algorithms.

33.The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to __ __________ (A). SQL (B). JSON (C). XML (D). All of the mentioned Answer: a Explanation: Pig Latin, in essence, is designed to fill the gap between the declarative style of SQL and the low-level procedural style of MapReduce.

34._______ jobs are optimized for scalability but not latency. (A). Mapreduce (B). Drill (C). Oozie (D). Hive Answer: d Explanation: Hive Queries are translated to MapReduce jobs to exploit the scalability of MapReduce.

35.______ is a framework for performing remote procedure calls and data serialization. (A). Drill (B). BigTop (C). Avro (D). Chukwa Answer: c Explanation: In the context of Hadoop, Avro can be used to pass data from one program or language to another.

********** Join:- https://t.me/AKTU_Notes_Books_Quantum **********

**********Module - 2 (Hadoop HDFS & Map Reduce)**********

1.A ________ serves as the master and there is only one NameNode per cluster. (A). Data Node (B). NameNode (C). Data block (D). Replication

Answer: b Explanation: All the metadata related to HDFS including the information about data nodes, files stored on HDFS, an d Replication, etc. are stored and maintained on the NameNode.

2.Point out the correct statement. (A). DataNode is the slave/worker node and holds the user data in the form of Data Blocks (B). Each incoming file is broken into 32 MB by default (C). Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance (D). None of the mentioned Answer: a Explanation: There can be any number of DataNodes in a Hadoop Cluster.

3.HDFS works in a __________ fashion. (A). master-worker (B). master-slave (C). worker/slave (D). all of the mentioned Answer: a Explanation: NameNode servers as the master and each DataNode servers as a worker/slave

4.________ NameNode is used when the Primary NameNode goes down. (A). Rack (B). Data (C). Secondary (D). None of the mentioned Answer: c Explanation: Secondary namenode is used for all time availability and reliability.

5.Point out the wrong statement. (A). Replication Factor can be configured at a cluster level (Default is set to 3). and also at a file level (B). Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode (C). User data is stored on the local file system of DataNodes (D). DataNode is aware of the files to which the blocks stored on it belong to Answer: d Explanation: NameNode is aware of the files to which the blocks stored on it belong to.

6.Which of the following scenario may not be a good fit for HDFS? (A). HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file (B). HDFS is suitable for storing data related to applications requiring low latency data access (C). HDFS is suitable for storing data related to applications requiring low latency data access (D). None of the mentioned Answer: a Explanation: HDFS can be used for storing archive data since it is cheaper as HDFS allows storing the data on low c ost commodity hardware while ensuring a high degree of fault-tolerance.

7.The need for data replication can arise in various scenarios like ____________ (A). Replication Factor is changed (B). DataNode goes down (C). Data Blocks get corrupted (D). All of the mentioned Answer: d Explanation: Data is replicated across different DataNodes to ensure a high degree of fault-tolerance.

8.________ is the slave/worker node and holds the user data in the form of Data Blocks.

(A). DataNode (B). NameNode (C). Data block (D). Replication Answer: a Explanation: A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more than one DataNo de, with data replicated across them.

9.HDFS provides a command line interface called __________ used to interact with HDFS. (A). “HDFS Shell” (B). “FS Shell” (C). “DFS Shell” (D). None of the mentioned Answer: b Explanation: The File System (FS). shell includes various shell-like commands that directly interact with the Hadoo p Distributed File System (HDFS).

10.HDFS is implemented in _____________ programming language. (A). C++ (B). Java (C). Scala (D). None of the mentioned Answer: b Explanation: HDFS is implemented in Java and any computer which can run Java can host a NameNode/DataNode on it.

11.For YARN, the ___________ Manager UI provides host and port information. (A). Data Node (B). NameNode (C). Resource (D). Replication Answer: c Explanation: All the metadata related to HDFS including the information about data nodes, files stored on HDFS, an d Replication, etc. are stored and maintained on the NameNode.

12.Point out the correct statement. (A). The Hadoop framework publishes the job flow status to an internally running web server on the master nodes of the Hadoop cluster (B). Each incoming file is broken into 32 MB by default (C). Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance (D). None of the mentioned Answer: a Explanation: The web interface for the Hadoop Distributed File System (HDFS). shows information about the Name Node itself.

13.For ________ the HBase Master UI provides information about the HBase Master uptime. (A). HBase (B). Oozie (C). Kafka (D). All of the mentioned Answer: a Explanation: HBase Master UI provides information about the num ber of live, dead and transitional servers, logs, Z ooKeeper information, debug dumps, and thread stacks.

14.During start up, the ___________ loads the file system state from the fsimage and the edits log file. (A). DataNode (B). NameNode (C). ActionNode (D). None of the mentioned Answer: b Explanation: HDFS is implemented on any computer which can run Java can host a NameNode/DataNode on it

15.7A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker. (A). MapReduce (B). Mapper (C). TaskTracker (D). JobTracker Answer: c Explanation: TaskTracker receives the information necessary for the execution of a Task from JobTracker, Executes the Task, and Sends the Results back to JobTracker.

16.Point out the correct statement. (A). MapReduce tries to place the data and the compute as close as possible (B). Map Task in MapReduce is performed using the Mapper(). function (C). Reduce Task in MapReduce is performed using the Map(). function (D). All of the mentioned Answer: a Explanation: This feature of MapReduce is “Data Locality”.

17.___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results. (A). Maptask (B). Mapper (C). Task execution (D). All of the mentioned Answer: a Explanation: Map Task in MapReduce is performed using the Map(). function.

18._________ function is responsible for consolidating the results produced by each of the Map(). functions/tasks. (A). Reduce (B). Map (C). Reducer (D). All of the mentioned Answer: a Explanation: Reduce function collates the work and resolves the results.

19.Point out the wrong statement. (A). A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tas ks in a completely parallel manner (B). The MapReduce framework operates exclusively on <key, value> pairs (C). Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods (D). None of the mentioned Answer: d Explanation: The MapReduce framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

20.Although the Hadoop framework is implemented in Java, MapReduce applications need not be written in ______ ______

(A). Java (B). C (C). C# (D). None of the mentioned Answer: a Explanation: Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications (non JNITM ba sed).

21.________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the red ucer. (A). Hadoop Strdata (B). Hadoop Streaming (C). Hadoop Stream (D). None of the mentioned Answer: b Explanation: Hadoop streaming is one of the most important utilities in the Apache Hadoop distribution.

22.__________ maps input key/value pairs to a set of intermediate key/value pairs. (A). Mapper (B). Reducer (C). Both Mapper and Reducer (D). None of the mentioned Answer: a Explanation: Maps are the individual tasks that transform input records into intermediate records.

23.The number of maps is usually driven by the total size of ____________ (A). inputs (B). outputs (C). tasks (D). None of the mentioned Answer: a Explanation: Total size of inputs means the total number of blocks of the input files.

24._________ is the default Partitioner for partitioning key space. (A). HashPar (B). Partitioner (C). HashPartitioner (D). None of the mentioned Answer: c Explanation: The default partitioner in Hadoop is the HashPartitioner which has a method called getPartition to parti tion.

25.Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster. (A). MapReduce (B). Map (C). Reducer (D). All of the mentioned Answer: a Explanation: In some applications, component tasks need to create and/or write to side-files, which differ from the a ctual job-output files.

********** Join:- https://t.me/AKTU_Notes_Books_Quantum **********

**********Module - 3 (NoSQL)**********

1.Following represent column in NoSQL __________. (A). Database (B). Field (C). Document (D). Collection Answer -B

2.What is the aim of NoSQL? (A). NoSQL provides an alternative to SQL databases to store textual data. (B). NoSQL databases allow storing non-structured data. (C). NoSQL is not suitable for storing structured data. (D). NoSQL is a new data format to store large datasets. Answer- D

3.__________ is a online NoSQL developed by Cloudera. (A). HCatalog (B). Hbase (C). Imphala (D). Oozie Answer-B

4.Which of the following is not a NoSQL database? (A). SQL Server (B). MongoDB (C). Cassandra (D). None of the mentioned Answer-A

5.Which of the following is a NoSQL Database Type? (A). SQL (B). Document databases (C). JSON (D). All of the mentioned Answer-B

6.Following represent column in NoSQL __________. (A). Database (B). Field (C). Document (D). Collection Answer-B

7.What is the aim of NoSQL? (A). NoSQL provides an alternative to SQL databases to store textual data. (B). NoSQL databases allow storing non-structured data. (C). NoSQL is not suitable for storing structured data. (D). NoSQL is a new data format to store large datasets. Answer-D

8.__________ is a online NoSQL developed by Cloudera.

(A). HCatalog (B). Hbase (C). Imphala (D). Oozie Answer-B

9.Which of the following is not a NoSQL database? (A). SQL Server (B). MongoDB (C). Cassandra (D). None of the mentioned Answer-A

10.Which of the following is a NoSQL Database Type? (A). SQL (B). Document databases (C). JSON (D). All of the mentioned Answer-B

11.Following represent column in NoSQL __________. (A). Database (B). Field (C). Document (D). Collection Answer-B

12.What is the aim of NoSQL? (A). NoSQL provides an alternative to SQL databases to store textual data. (B). NoSQL databases allow storing non-structured data. (C). NoSQL is not suitable for storing structured data. (D). NoSQL is a new data format to store large datasets. Answer-D

13.__________ is a online NoSQL developed by Cloudera. (A). HCatalog (B). Hbase (C). Imphala (D). Oozie Answer-B

14.Which of the following is not a NoSQL database? (A). SQL Server (B). MongoDB (C). Cassandra (D). None of the mentioned Answer-A

15.Which of the following is a NoSQL Database Type? (A). SQL (B). Document databases (C). JSON (D). All of the mentioned Answer-B

16.Following represent column in NoSQL __________. (A). Database (B). Field (C). Document (D). Collection Answer-B

17.What is the aim of NoSQL? (A). NoSQL provides an alternative to SQL databases to store textual data. (B). NoSQL databases allow storing non-structured data. (C). NoSQL is not suitable for storing structured data. (D). NoSQL is a new data format to store large datasets. Answer-D

18.__________ is a online NoSQL developed by Cloudera. (A). HCatalog (B). Hbase (C). Imphala (D). Oozie Answer-B

19.Which of the following is not a NoSQL database? (A). SQL Server (B). MongoDB (C). Cassandra (D). None of the mentioned Answer-A

20.Which of the following is a NoSQL Database Type? (A). SQL (B). Document databases (C). JSON (D). All of the mentioned Answer-B

21.Following represent column in NoSQL __________. (A). Database (B). Field (C). Document (D). Collection Answer-B

22.What is the aim of NoSQL? (A). NoSQL provides an alternative to SQL databases to store textual data. (B). NoSQL databases allow storing non-structured data. (C). NoSQL is not suitable for storing structured data. (D). NoSQL is a new data format to store large datasets. Answer-D

23.__________ is a online NoSQL developed by Cloudera. (A). HCatalog (B). Hbase (C). Imphala

(D). Oozie Answer-B

24.Which of the following is not a NoSQL database? (A). SQL Server (B). MongoDB (C). Cassandra (D). None of the mentioned Answer-A

25.Which of the following is a NoSQL Database Type? (A). SQL (B). Document databases (C). JSON (D). All of the mentioned Answer-B

********** Join:- https://t.me/AKTU_Notes_Books_Quantum **********

**********Module - 4 (Mining Data Streams)**********

1.Bloom filter was proposed by : (A). Burton morris Bloom (B). Burton Howard Bloom (C). Burton Datar Bloom (D). Burton Howrd Bloom Answer : Burton morris Bloom

2. A simple space-efficient randomized data structure for representing a set in order to support membership queries (A). Bloom Filter (B). Flajolet Martin (C). DGIM K-means Answer: (A). Bloom Filter

3.It is a web-based financial search engine that evaluates queries over real-time streaming financial data such as stoc k tickers and news feeds (A). Traderbot (B). Tradebot (C). Clickbot (D). Hyperbot Answer : (A). Traderbot

3.If the stream contains n elements with m of them unique, the FM algorithm needs a memory of . (A). O((m)) (B). O(log(m+1)) (C). O(log(m+2)) (D). O(log(m)) Answer : (D). O(log(m))

4.Calculate h(3). , given S=1,3,2,1,2,3,4,3,1,2,3,1 and h(x)=(6x+1). mod 5

(A). 19 (B). 10 (C). 15 (D). 16 Answer : (A). 19

5.According to Bloom filter principle, we should consider the potential effects of: (A). true positives (B). false negatives (C). false positives (D). true negatives Answer : (C). false positives

6.Who released a hash function named MurmurHash in 2008: (A). Datar Motwani (B). Austin Appleby (C). Marrianne Durrand (D). Burton Datar Bloom Answer : (B). Austin Appleby

8.The files on disks or records in databases need to be stored in Bloom filter as (A). keys (B). values (C). key-values (D). columns Answer : (C). key-values

9. If the stream contains n elements with m of them unique, the FM algorithm runs in ------------------ time (A). O(sq.rt(n)) (B). O(n+2) (C). O(n+1) (D). O(n) Answer : (D). O(n)

10.Given h(x). = x + 6 mod 32 , The binary value of h(4). : (A). 1011 (B). 1010 (C). 1110 (D). 1111 Answer : (B). 1010

11.Flajolet-Martin algorithm approximates the number of unique objects in a stream or a database in how many pass es? (A). n (B). 0 (C). 1 (D). 2 Answer : (C). 1

12.What is important when the input rate of facebook data is controlled externally: (A). facebook Management (B). Query Management (C). Stream Management (D). data Management

Answer: (B). Query Management

13.Which algorithmn solution does not assume uniformity? (A). DGIM (B). FM (C). SON (D). K-MEANS Answer: (A). DGIM

14. Which query operator is unable to produce an answer until it has seen its entire input: (A). Blocking query operator (B). Discrete operator (C). continuous operator (D). Continuous operator and discrete queries

15. 000101 has tail length of ------- : (A). 1 (B). 2 (C). 3 (D). 0 Answer: (D). 0

16.In FM algorithm, The probability that a given h(a). ends in at least i 0’s is (A). 1 (B). 0 (C). 2^-i (D). i Answer : (C). 2^-i

17. Probability of a false positive in Bloom Filters depends on (A). the number of hash functions (B). the density of 1’s in the array (C). the number of hash functions and the density of 1’s in the array (D). the density of 0’s in the array Answer : (C). the number of hash functions and the density of 1’s in the array

18. It is an array of bits, together with a number of hash functions (A). Bloom filter Hash Function (C). Data Stream (D). Binary input Answer : (A). Bloom filter

19. ______________query is one that is supplied to the Dsms before any relevant data arrived (A). Continuous queries and discrete queries (B). discrete queries (C). ad-hoc (D). pre-defined Answer : (D). pre-defined

20. Sorting used for query processing is an example of : (A). Blocking query operator (B). Blocking discrete operator (C). Blocking Continuous operator

(D). Continuous operator Answer : (A). Blocking query operator

********** Join:- https://t.me/AKTU_Notes_Books_Quantum **********

**********Module - 5 (Finding Similar Items & Clustering)**********

1.PCY Stands for (A). Park-Chen-Yu (B). Park-Chen-You (C). Park-Check-Yu (D). Park-Check-You Answer :a.Park-Chen-Yu

2. SON Algorithm Stands for (A). Shane,Omiecinski and Navathe (B). Savasere,Omiecinski and Navathe (C). Savare,Omienal and Navathe (D). Savasere,Omiecinski and Navarag Answer : (B). Savasere,Omiecinski and Navathe

3. Minimum Support=?,if total Transaction =5 and minimum Support=60% (A). 30 (B). 3 (C). 300 (D). 65 Answer : (B). 3

4.Minimum Support=?,if total Transaction =10 and minimum Support=60% (A). 6 (B). 0.6 (C). 10 (D). 5 Answer : (A). 6

5. How do you calculate Confidence(B -> A)? (A). Support(A B). / Support (A) (B). Support(A B). / Support (B) (C). Support(A ). / Support (B) (D). Support( B). / Support (A) Answer : (B). Support(A B). / Support (B)

**********Module - 6 (Real Time Big Data Models)**********

1.Which of the following is true? (A). graph may contain no edges and many vertices

(B). graph may contain many edges and atleast one vertices (C). graph may contain no edges and no vertices (D). graph may contain no vertices and many edges Answer : (B). graph may contain many edges and atleast one vertices

2.Social Network is defined as (A). Collection of entities that participate in the network. (B). Collection of items in store (C). Collection of vertices & edges in a graph (D). Collection of nodes in a graph Answer : (A). Collection of entities that participate in the network.

3.Which of the following is finally produced by Hierarchical Clustering? (A). final estimate of cluster centroids (B). tree showing how close things are to each other (C). Assignment of each point to clusters (D). Assignment of each edges of clusters Answer : (B). tree showing how close things are to each other

4. Which of the following clustering requires merging approach? (A). Partitional (B). Hierarchical (C). Naive Bayes (D). K-means Answer : (B). Hierarchical

5.Which of the following function is used for k-means clustering? (A). K-means (B). Euclidean Distance (C). Heatmap (D). Correlation Similarity Answer: (A). K-means

6.___________ was the pioneer in the field of web search with the use of PageRank for ranking Web pages with res pect to a user query. (A). Yahoo (B). YouTube (C). Facebook (D). Google Answer : (D). Google

7.Which of the following algorithm is used by Google to determine the importance of a particular page? (A). SVD (B). PageRank (C). FastMap (D). All of the above Answer : (B). PageRank

8.One of the popular techniques of Spamdexing is ___________ (A). Clocking (B). Cooking (C). Cloaking (D). Crocking

Answer: (C). Cloaking

9.Doorway pages are_________ Web pages. (A). High quality (B). Low quality (C). Informative (D). High content Answer : (B). Low quality

10.PageRank helps in measuring ________________ of a Web page within a set of similar entries. (A). Relative importance (B). Size (C). Cost (D). All of the above Answer : (A). Relative importance

11.PageRank helps in measuring ________________ of a Web page within a set of similar entries. (A). Relative importance (B). Size (C). Cost (D). All of the above Answer : (A). Relative importance

12.Web pages with Dead ends means__________ (A). Pages with no outlinks (B). Pages with no PageRank (C). Pages with no contents (D). Pages with spam Answer : (A). Pages with no outlinks

13.Topic Sensitive PageRank (TSPR). is proposed by_________ in 2003. (A). Al-Saffar (B). Bratislav V. Stojanović (C). Jianshu WENG (D). Taher H. Haveliwala Answer : (D). Taher H. Haveliwala

14.Full form of HITS is _____________ (A). High Influential Topic Search (B). High Informative Topic Search (C). Hyperlink-induced topic Search (D). None of the above Answer : (C). Hyperlink-induced topic Search

15.HITS algorithm and the PageRank algorithm both make use of the _________to decide the relevance of the page s. (A). Link structure of the Web graph (B). Design of the Web graph (C). Content of the web pages (D). All of the above Answer : (A). Link structure of the Web graph

16.When the objective is to mine social network for patterns, a natural way to represent a social network is by a____ _______

(A). Tree (B). Graph (C). Arrays (D). Lists Answer : (B). Graph

17.A social network can be considered as a___________ (A). Heterogeneous and multi relational dataset (B). LiveJournal (C). Twitter (D). DBLP Answer : (A). Heterogeneous and multi relational dataset

18.For an edge ‘e’ in a graph, ___________of ‘e’ is defined as the number of shortest paths between all node pairs ( vi vj). in the graph such that the shortest path passes through ‘e’. (A). Edge path (B). Edge measure (C). Edge closeness (D). Edge betweenness Answer : (D). Edge betweenness

19.“You may also like these…”, “People who liked this also liked….”, this type of suggestions are from the_______ _______ (A). Filtering System (B). Collaborative System (C). Recommendation System (D). Amazon System Answer: (C). Recommendation System

20.An approach to a Recommendation system is to treat this as the _______________ problem using items profiles and utility matrices. (A). MapReduce (B). Social Network (C). Machine learning (D). Unstructured Answer : (C). Machine learning

***************Data Analytics MCQs Set - 1***************

1. The branch of statistics which deals with development of particular statistical methods

is classified as

1. industry statistics

2. economic statistics

3. applied statistics

4. applied statistics

Answer: applied statistics

2. Which of the following is true about regression analysis?

1. answering yes/no questions about the data

2. estimating numerical characteristics of the data

3. modeling relationships within the data

4. describing associations within the data

Answer: modeling relationships within the data

3. Text Analytics, also referred to as Text Mining?

1. True Join:- https://t.me/AKTU_Notes_Books_Quantum

2. False

3. Can be true or False

4. Can not say

Answer: True

4. What is a hypothesis?

1. A statement that the researcher wants to test through the data collected in a study.

2. A research question the results will answer.

3. A theory that underpins the study.

4. A statistical method for calculating the extent to which the results could have happened by

chance.

Answer: A statement that the researcher wants to test through the data collected in a study.

5. What is the cyclical process of collecting and analysing data during a single research

study called?

1. Interim Analysis

2. Inter analysis

3. inter item analysis

4. constant analysis

Answer: Interim Analysis

6. The process of quantifying data is referred to as ____ Join:- https://t.me/AKTU_Notes_Books_Quantum

1. Topology

2. Digramming

3. Enumeration

4. coding

Answer: Enumeration

7. An advantage of using computer programs for qualitative data is that they _

1. Can reduce time required to analyse data (i.e., after the data are transcribed)

2. Help in storing and organising data

3. Make many procedures available that are rarely done by hand due to time constraints

4. All of the above

Answer: All of the Above

8. Boolean operators are words that are used to create logical combinations.

1. True

2. False

Answer: True

9. ______ are the basic building blocks of qualitative data.

1. Categories Join:- https://t.me/AKTU_Notes_Books_Quantum

2. Units

3. Individuals

4. None of the above

Answer: Categories

10. This is the process of transforming qualitative research data from written interviews or field notes into typed text.

1. Segmenting

2. Coding

3. Transcription

4. Mnemoning

Answer: Transcription

11. A challenge of qualitative data analysis is that it often includes data that are unwieldy and complex; it is a major challenge to make sense of the large pool of data.

1. True

2. False

Answer: True

12. Hypothesis testing and estimation are both types of descriptive statistics.

1. True

2. False Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: False

13. A set of data organised in a participants(rows)-by-variables(columns) format is known as a “data set.”

1. True

2. False

Answer: True

14. A graph that uses vertical bars to represent data is called a ___

1. Line graph

2. Bar graph

3. Scatterplot

4. Vertical graph

Answer: Bar graph

15. ____ are used when you want to visually examine the relationship between two

quantitative variables.

1. Bar graph

2. pie graph

3. line graph

4. Scatterplot Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: Scatterplot

16. The denominator (bottom) of the z-score formula is

1. The standard deviation

2. The difference between a score and the mean

3. The range

4. The mean

Answer: The standard deviation

17. Which of these distributions is used for a testing hypothesis?

1. Normal Distribution

2. Chi-Squared Distribution

3. Gamma Distribution

4. Poisson Distribution

Answer: Chi-Squared Distribution

18. A statement made about a population for testing purpose is called?

1. Statistic

2. Hypothesis

3. Level of Significance

4. Test-Statistic Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: Hypothesis

19. If the assumed hypothesis is tested for rejection considering it to be true is called?

1. Null Hypothesis

2. Statistical Hypothesis

3. Simple Hypothesis

4. Composite Hypothesis

Answer: Null Hypothesis

20. If the null hypothesis is false then which of the following is accepted?

1. Null Hypothesis

2. Positive Hypothesis

3. Negative Hypothesis

4. Alternative Hypothesis.

Answer: Alternative Hypothesis.

21. Alternative Hypothesis is also called as?

1. Composite hypothesis

2. Research Hypothesis

3. Simple Hypothesis

4. Null Hypothesis Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: Research Hypothesis

******** Join:- https://t.me/AKTU_Notes_Books_Quantum ********

*************** Data Analytics MCQs Set – 2 ***************

1. What is the minimum no. of variables/ features required to perform clustering?

1.0

2.1

3.2

4.3

Answer: 1

2. For two runs of K-Mean clustering is it expected to get same clustering results?

1. Yes

2. No

Answer: No

3. Which of the following algorithm is most sensitive to outliers? Join:- https://t.me/AKTU_Notes_Books_Quantum

1. K-means clustering algorithm

2. K-medians clustering algorithm

3. K-modes clustering algorithm

4. K-medoids clustering algorithm

Answer: K-means clustering algorithm

4. The discrete variables and continuous variables are two types of

1. Open end classification

2. Time series classification

3. Qualitative classification

4. Quantitative classification

Answer: Quantitative classification

5. Bayesian classifiers is

1. A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.

2. Any mechanism employed by a learning system to constrain the search space of a hypothesis

4. None of these

Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.

6. Classification accuracy is

1. A subdivision of a set of examples into a number of classes

2. Measure of the accuracy, of the classification of a concept that is given by a certain theory

3. The task of assigning a classification to a set of examples

4. None of these

Answer: Measure of the accuracy, of the classification of a concept that is given by a certain theory

7. Euclidean distance measure is

1. A stage of the KDD process in which new data is added to the existing selection.

2. The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them

3. The distance between two points as calculated using the Pythagoras theorem

4. none of above

Answer: The distance between two points as calculated using the Pythagoras theorem

8. Hybrid is

1. Combining different types of method or information Join:- https://t.me/AKTU_Notes_Books_Quantum

2. Approach to the design of learning algorithms that is structured along the lines of the theory of evolution.

3. Decision support systems that contain an information base filled with the knowledge of an expert formulated in terms of if-then rules.

4. none of above

Answer: Combining different types of method or information

9. Decision trees use , in that they always choose the option that seems the best available at that moment.

1. Greedy Algorithms

2. divide and conquer

3. Backtracking

4. Shortest path algorithm

Answer: Greedy Algorithms

10. Discovery is

1. It is hidden within a database and can only be recovered if one is given certain clues (an example IS encrypted information).

2. The process of executing implicit previously unknown and potentially useful information from data

3. An extremely complex molecule that occurs in human chromosomes and that carries genetic

information in the form of genes.

4. None of these

Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: The process of executing implicit previously unknown and potentially useful information from data

11. Hidden knowledge referred to

1. A set of databases from different vendors, possibly using different database paradigms

2. An approach to a problem that is not guaranteed to work but performs well in most cases

3. Information that is hidden in a database and that cannot be recovered by a simple SQL query.

4. None of these

Answer: Information that is hidden in a database and that cannot be recovered by a simple SQL query.

12. Decision trees cannot handle categorical attributes with many distinct values, such as country codes for telephone numbers.

1. True

2. False

Answer: False

15. CNMICHMENT IS

1. A stage of the KDD process in which new data is added to the existing selection

2. The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them

3. The distance between two points as calculated using the Pythagoras theorem.

4. None of these Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: A stage of the KDD process in which new data is added to the existing selection

14. are easy to implement and can execute efficiently even without prior knowledge of the data, they are among the most popular algorithms for classifying text documents.

1. 1D3

2. Naive Bayes classifiers

3. CART

4. None of above

Answer: Naive Bayes classifiers

15. High entropy means that the partitions in classification are

1. Pure

2. Not Pure

3. Usefull

4. useless

Answer: Uses a single processor or computer

16. Which of the following statements about Naive Bayes is incorrect?

1. Attributes are equally important.

2. Attributes are statistically dependent of one another given the class value.

3. Attributes are statistically independent of one another given the class value. Join:- https://t.me/AKTU_Notes_Books_Quantum

4. Attributes can be nominal or numeric

Answer: Attributes are statistically dependent of one another given the class value.

17. The maximum value for entropy depends on the number of classes so if we have 8 Classes what will be the max entropy.

1. Max Entropy is 1

2. Max Entropy is 2

3. Max Entropy is 3

4. Max Entropy is 4

Answer: Max Entropy is 3

18. Point out the wrong statement.

1. k-nearest neighbor is same as k-means

2. k-means clustering is a method of vector quantization

3. k-means clustering aims to partition n observations into k clusters

4. none of the mentioned

Answer: k-nearest neighbor is same as k-means

19. Consider the following example “How we can divide set of articles such that those articles have the same theme (we do not know the theme of the articles ahead of time) ” is this:

1. Clustering

2. Classification Join:- https://t.me/AKTU_Notes_Books_Quantum

3. Regression

4. None of these

Answer: Clustering

20. Can we use K Mean Clustering to identify the objects in video?

1. Yes

2. No

Answer: Yes

21. Clustering techniques are in the sense that the data scientist does not determine, in advance, the labels to apply to the clusters.

1. Unsupervised

2. supervised

3. Reinforcement

4, Neural network

Answer: Unsupervised

22. metric is examined to determine a reasonably optimal value of k.

1. Mean Square Error

2. Within Sum of Squares (WSS)

3. Speed Join:- https://t.me/AKTU_Notes_Books_Quantum

4. None of these

Answer: Within Sum of Squares (WSS)

23. If an itemset is considered frequent, then any subset of the frequent itemset must also be frequent.

1. Apriori Property

2. Downward Closure Property

3. Either 1 or 2

4. Both 1 and 2

Answer: Both 1 and 2Z

24. if {bread,eggs,milk} has a support of 0.15 and {bread,eggs} also has a support of 0.15, the confidence of rule {bread,eggs} = {milk} is

1.0

2.1

3.2

4.3

Answer: 1

25. Confidence is a measure of how X and Y are really related rather than coincidentally happeningtogether.

1. True Join:- https://t.me/AKTU_Notes_Books_Quantum

2. False

Answer: False

26. recommend items based on similarity measures between users and/or items.

1. Content Based Systems

2. Hybrid System

3. Collaborative Filtering Systems

4. None of these

Answer: Collaborative Filtering Systems

27. There are major Classification of Collaborative Filtering Mechanisms

1.1

2.2

3.3

4. none of above

Answer: 2

28. Movie Recommendation to people is an example of

1. User Based Recommendation

2. Item Based Recommendation

3. Knowledge Based Recommendation Join:- https://t.me/AKTU_Notes_Books_Quantum

4. content based recommendation

Answer: Item Based Recommendation

29. recommenders rely on an explicitely defined set of recommendation rules

1. Constraint Based

2. Case Based

3. Content Based

4. User Based

Answer: Case Based

30. Parallelized hybrid recommender systems operate dependently of one another and produce separate recommendation lists.

1. True

2. False

Answer: False

Data Analytics

Unit 1:

1. Data Analysis is a process of?

A. inspecting data B. cleaning data C. transforming data D. All of the above

2. Which of the following is not a major data analysis approaches?

A. Data Mining B. Predictive Intelligence C. Business Intelligence D. Text Analytics

3. How many main statistical methodologies are used in data analysis?

A. 2 B. 3 C. 4 D. 5

4. In descriptive statistics, data from the entire population or a sample is summarized with ?

A. integer descriptors B. floating descriptors C. numerical descriptors D. decimal descriptors View Answer

5. Data Analysis is defined by the statistician?

A. William S. B. Hans Peter Luhn C. Gregory Piatetsky-Shapiro D. John Tukey

6. Which of the following is true about hypothesis testing?

A. answering yes/no questions about the data B. estimating numerical characteristics of the data C. describing associations within the data D. modeling relationships within the data

7. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

A. TRUE B. FALSE C. Can be true or false D. Can not say

8. The branch of statistics which deals with development of particular statistical methods is classified as

A. industry statistics B. economic statistics C. applied statistics D. applied statistics

9. Which of the following is true about regression analysis?

A. answering yes/no questions about the data B. estimating numerical characteristics of the data C. modeling relationships within the data D. describing associations within the data

10. Text Analytics, also referred to as Text Mining?

A. TRUE B. FALSE C. Can be true or false D. Can not say

11. In an Internet context, this is the practice of tailoring Web pages to individual users’ characteristics or preferences. 1. Web services 2. customer-facing 3. client/server 4. personalization

3. business information warehouse 4. business intelligence

16. What are the five V’s of Big Data? 1. Volume 2. velocity 3. Variety 4. All of the above

17. ____ hides the limitations of Java behind a powerful and concise Clojure API for Cascading.” 1. Scalding 2. Cascalog 3. Hcatalog 4. Hcalding

18. What are the main components of Big Data? 1. MapReduce 2. HDFS 3. YARN

4. All of these

19. What are the different features of Big Data Analytics? 1. Open-Source 2. Scalability 3. Data Recovery 4. All the above

20. Define the Port Numbers for NameNode, Task Tracker and Job Tracker 1. NameNode 2. Task Tracker 3. Job Tracker 4. All of the above

21. Facebook Tackles Big Data With ____ based on Hadoop 1. Project Prism 2. Prism 3. ProjectData 4. ProjectBid

22. Which of the following is not a phase of Data Analytics Life Cycle? 1. Communication 2. Recall 3. Data Preparation 4. Model Planning

UNIT 2: DATA ANALYSIS

a. each beer consumed increases blood alcohol by 1.27%

b. on average it takes 1.8 beers to increase blood alcohol content by 1%

c. each beer consumed increases blood alcohol by an average of amount of 1.8%

d. each beer consumed increases blood alcohol by exactly 0.018

3 . SSE can never be

a. larger than SST

b. smaller than SST

c. equal to 1

d. equal to zero

4. Regression modeling is a statistical framework for developing a mathematical equation that describes how

a. one explanatory and one or more response variables are related

b. several explanatory and several response variables response are related

c. one response and one or more explanatory variables are related

d. All of these are correct.

5. In regression analysis, the variable that is being predicted is the

a. response, or dependent, variable

b. independent variable

c. intervening variable

d. is usually x

was obtained. ! = 31.9 – 0.34x Based on the above estimated regression equation, if the return rate were to decrease by 10% the rate of immigration to the colony would:

a. increase by 34%

b. increase by 3.4%

c. decrease by 0.34%

d. decrease by 3.4%

7. In least squares regression, which of the following is not a required assumption about the error term ε?

a. The expected value of the error term is one.

b. The variance of the error term is the same for all values of x.

c. The values of the error term are independent.

d. The error term is normally distributed.

8. Larger values of r 2 (R2 ) imply that the observations are more closely grouped about the

a. average value of the independent variables

b. average value of the dependent variable

c. least squares line

d. origin

9. In a regression analysis if r 2 = 1, then

a. SSE must also be equal to one

b. SSE must be equal to zero

c. SSE can be any positive value

d. SSE must be negative

10.Which type of multivariate analysis should be used when a researcher wants to reduce a Set of variables to a smaller set of composite variables by identifying underlying dimensions of the data?

A)Conjoint analysis

B)Cluster analysis

C)Multiple regression analysis

D)Factor analysis

11. Which type of multivariate analysis should be used when a researcher wants to estimate The utility that consumers associate with different product features?

A)Conjoint analysis

B)Cluster analysis\ A

C)Multiple regression analysis

D)Factor analysis

12. Which type of multivariate analysis should be used when a researcher wants to identify Subgroups of individuals that are homogeneous within subgroups and different from other subgroups?

A)Conjoint analysis

B)Cluster analysis

C)Multiple regression analysis

D)Factor analysis

13. Which type of multivariate analysis should be used when a researcher wants predict Group membership on the basis of two or more independent variables?

A)Conjoint analysis

B)Cluster analysis

C)Multiple regression analysis

D)Multiple discriminant analysis

14. Support vector machine (SVM) is a _________ classifier? Discriminative

Generative

15. SVM can be used to solve ___________ problems. Classification

Regression

Clustering

Both Classification and Regression

16. SVM is a ___________ learning algorithm Supervised

Unsupervised

17. SVM is termed as ________ classifier Minimum margin

Maximum margin

18. The training examples closest to the separating hyperplane are called as _______ Training vectors

Test vectors

19. A factor analysis is…, while a principal components analysis is…

A broad term, the most commonly used technique for doing factor analysis.

B The most commonly used technique for doing factor analysis, a broad term.

C Both of the above.

D NONE OF THE ABOVE

20. Dimension Reduction is defined as-

D NONE OF THE ABOVE

21.. What is the form of Fuzzy logic? a) Two-valued logic b) Crisp set logic c) Many-valued logic d) Binary set logic

22. Traditional set theory is also known as Crisp Set theory. a) True b) False

24. Fuzzy logic is extension of Crisp set with an extension of handling the concept of Partial Truth. a) True b) False

25. The room temperature is hot. Here the hot (use of linguistic variable is used) can be represented by _______ a) Fuzzy Set b) Crisp Set c) Fuzzy & Crisp Set

d) None of the mentioned

26. The values of the set membership is represented by ___________ a) Discrete Set b) Degree of truth c) Probabilities d) Both Degree of truth & Probabilities

27. Japanese were the first to utilize fuzzy logic practically on high-speed trains in Sendai. a) True b) False

28. Fuzzy Set theory defines fuzzy operators. Choose the fuzzy operators from the following. a) AND b) OR c) NOT d) All of the mentioned

29. There are also other operators, more linguistic in nature, called __________ that can be applied to fuzzy set theory. a) Hedges b) Lingual Variable c) Fuzz Variable d) None of the mentioned

30. Fuzzy logic is usually represented as ___________ a) IF-THEN-ELSE rules b) IF-THEN rules c) Both IF-THEN-ELSE rules & IF-THEN rules d) None of the mentioned

31. Like relational databases there does exists fuzzy relational databases. a) True b) False

32. ______________ is/are the way/s to represent uncertainty. a) Fuzzy Logic b) Probability c) Entropy d) All of the mentioned

Unit 3:

1 : What do you mean by sampling of stream data?

Question 2 : if Distance measure d(x, y)= d(y, x) then it is called

1. Symmetric 2. identical 3. positiveness 4. triangle inequality

Question 3 : NOSQL is

1. Not only SQL 2. Not SQL 3. Not Over SQL 4. No SQL

Question 4 : Find the L1 and L2 distances between the points (5, 6, 7) and (8, 2, 4).

1. L1 =10 , L2 = 5.83 2. L1 =10 , L2 = 5 3. L1 =11 , L2 = 4.9

4. L1 =9 , L2 = 5.83

Question 5 : The time between elements of one stream

1. need not be uniform 2. need to be uniform 3. must be 1ms. 4. must be 1ns

Question 6 : A Reduce task receives

1. one or more keys and their associated value list 2. key value pair 3. list of keys and their associated values 4. list of key value pairs

Question 7 : Which of the following statements about data streaming is true?

1. Stream data is always unstructured data. 2. Stream data often has a high velocity. 3. Stream elements cannot be stored on disk. 4. Stream data is always structured data.

Question 8 : Hadoop is the solution for:

1. Database software 2. Big Data Software 3. Data Mining software 4. Distribution software

Question 9 : ETL stands for ________________

1. Extraction transformation and loading 2. Extract Taken Lend 3. Enterprise Transfer Load 4. Entertainment Transference Load

Question 10 : “Sharding” a database across many server instances can be achieved with _______________

1. MAN 2. LAN 3. WAN 4. SAN

Question 11 : Neo4j is an example of which of the following NoSQL architectural pattern?

1. Key-value store 2. Graph Store 3. Document Store 4. Column-based Store

Question 12 : CSV and JSON can be described as

1. Structured data 2. Unstructured data 3. Semi-structured data 4. Multi-structured data

Question 13 : The hardware term used to describe Hadoop hardware requirements is

1. Commodity firmware

2. Commodity software 3. Commodity hardware 4. Cluster hardware

Question 14 : Which of the following is not a Hadoop Distributions?

1. MAPR 2. Cloudera 3. Hortonworks 4. RMAP

Question 15 : Which of the following Operation can be implemented with Combiners?

1. Selection 2. Projection 3. Natural Join 4. Union

Question 16 : ________ stores are used to store information about networks, such as social connections.

1. Key-value 2. Wide-column 3. Document 4. graph

1. The number of 0's cannot be estimated at all. 2. The number of 0's can be estimated with a maximum guaranteed error

Question 18 : If size of file is 4 GB and block size is 64 MB then number of mappers required for MapReduce task is

1. 8 2. 16 3. 32 4. 64

Question 19 : Which of the following is not the default daemon of Hadoop?

1. Namenode 2. Datanode 3. Job Tracker 4. Job history server

Question 20 : In Bloom filter an array of n bits is initialized with

1. all 0s 2. all 1s 3. half 0s and half 1s 4. all -1

Question 21 : _____________is a batch-based, distributed computing framework modeled after Google’s paper.

1. MapCompute 2. MapReuse 3. MapCluster

4. MapReduce

Question 22 : What is the edit distance between A=father and B=feather ?

1. 5 2. 1 3. 4 4. 2

Question 23 : Sliding window operations typically fall in the category

1. OLTP Transactions 2. Big Data Batch Processing 3. Big Data Real Time Processing 4. Small Batch Processing

Question 24 : _________ systems focus on the relationship between users and items for recommendation.

1. DGIM 2. Collaborative-Filtering 3. Content Based and Collaborative Filtering 4. Content Based

Question 25 : Find Hamming Distance for vectors A=100101011 B=100010010

1. 2 2. 4 3. 3 4. 1

Question 26 : During start up, the ___________ loads the file system state from the fsimage and the edits log file.

1. Datanode 2. Namenode 3. Secondary Namenode 4. Rack awereness policy

Question 27 : What is the finally produced by Hierarchical Agglomerative Clustering?

1. final estimate of cluster centroids 2. assignment of each point to clusters 3. tree showing how close things are to each other 4. Group of clusters

Question 28 : The Jaccard similarity of two non-binary sets A and B, is defined by__________

1. Jaccard Index 2. Primary Index 3. Secondary Index 4. Clustered Index

Question 29 : Following is based on grid like street geography of the New York:

1. Manhattan Distance 2. Edit Distance 3. Hamming distance 4. Lp distance

Question 30 : The FM-sketch algorithm can be used to:

31 : Pick a hash function h that maps each of the N elements to at least log2 N bits, Estimated number of distinct elements is

1. 2^R 2. 2^(-R) 3. 1-(2^R) 4. 1-(2^(-R))

Question 32 : which of the following is not the characterstic of stream data?

1. Continuous 2. ordered 3. persistant 4. huge

Question 33 : Which of the following is a column-oriented database that runs on top of HDFS

1. Hive 2. Sqoop 3. Hbase 4. Flume

Question 34 : Which of the following decides the number of partitions that are created on the local file system of the worker nodes?

1. Number of map tasks 2. Number of reduce tasks 3. Number of file input splits 4. Number of distinct keys in the intermediate key-value pairs

Question 35 : Which of the following is not the class of points in BFR algorithm

1. Discard Set (DS) 2. Compression Set (CS) 3. Isolation Set (IS) 4. Retained Set (RS)

Question 36 : Which of the following is not true for 5v?

1. Volume 2. variable 3. Velocity 4. value

Question 37 : Which algorithm isused to find fully connected subgraph in soial media mining?

1. CURE 2. CPM 3. SimRank 4. Girvan-Newman Algorithm

Question 38 : A ________________ query Q is a query that is issued once over a database D, and then logically runs continuously over the data in D until Q is terminated.

1. One-time Query 2. Standing Query 3. Adhoc Query

4. General Query

Question 39 : Effect of Spider trap on page rank

1. perticular page get the highest page rank 2. All the pages of web will get 0 page rank 3. no effect on any page 4. affects a perticular set of pages

Question 40 : Which of the following is correct option for MongoDB

1. MongoDB is column oriented data store 2. MongoDB uses XML more in comparison with JSON 3. MongoDB is a document store database 4. MongoDB is a key-value data store

Question 41 : _________ systems focus on the relationship between users and items for recommendation.

1. DGIM 2. Collaborative-Filtering 3. Content Based and Collaborative Filtering 4. Content Based

Question 42 : The graphical representation of an SNA is made up of links and _____________.

1. People 2. Networks 3. Nodes 4. Computers

Question 43 : Hadoop is a framework that works with a variety of related tools. Common hadoop ecosystem include ____________

1. MapReduce, Hummer and Iguana 2. MapReduce, Hive and HBase 3. MapReduce, MySQL and Google Apps 4. MapReduce, Heron and Trumpet

Question 44 : About data streaming, Which of the following statements is true?

1. Stream data is always unstructured data. 2. Stream data often has a high velocity. 3. Stream elements cannot be stored on disk. 4. Stream data is always structured data.

Question 45 : Which of the following is a NoSQL Database Type ?

1. SQL 2. JSON 3. Document databases 4. CSV

Question 46 : Techniques for fooling search engines into believing your page is about something it is not, are called _____________.

1. term spam 2. page rank 3. phishing 4. dead ends

Question 47 : The police set up checkpoints at randomly selected road locations, then inspected every driver at those locations. What type of sample is this?

1. Simple Random Sample 2. Startified Random Sample 3. Cluster Random Sample 4. Uniform sampling

Question 48 : Which of the following statements about standard Bloom filters is correct?

Question 49 : Which of the following is responsible for managing the cluster resources and use them for scheduling users’ applications?

1. Hadoop Common 2. YARN 3. HDFS 4. MapReduce

1. Variability 2. Variety 3. Volume 4. Complexity

Unit 4:

Question 1 This clustering algorithm terminates when mean values computed for the current iteration of the algorithm are identical to the computed mean values for the previous iteration Select one:

a. K-Means clustering

b. conceptual clustering

c. expectation maximization

d. agglomerative clustering Show Answer

Question 2 This clustering approach initially assumes that each data instance represents a single cluster. Select one:

a. expectation maximization

b. K-Means clustering

c. agglomerative clustering

d. conceptual clustering Show Answer

Question 3 The correlation coefficient for two real-valued attributes is – 0.85. What does this value tell you? Select one:

a. The attributes are not linearly related.

b. As the value of one attribute decreases the value of the second attribute increases.

c. As the value of one attribute increases the value of the second attribute also increases.

d. The attributes show a linear relationship Show Answer

Question 4 Time Complexity of k-means is given by Select one:

a. O(mn)

b. O(tkn)

c. O(kn)

d. O(t2kn) Show Answer

Question 5 Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that Select one:

a. Y is false when X is known to be false.

b. Y is true when X is known to be true.

c. X is true when Y is known to be true

d. X is false when Y is known to be false.

Question 6 Chameleon is Select one:

a. Density based clustering algorithm

b. Partitioning based algorithm

c. Model based algorithm

d. Hierarchical clustering algorithm

Question 7 In _________ clusterings, points may belong to multiple clusters Select one:

a. Non exclusivce

b. Partial

c. Fuzzy

d. Exclusive Show Answer

Question 8 Find odd man out Select one:

a. DBSCAN

b. K mean

c. PAM

d. K medoid

Question 9 Which statement is true about the K-Means algorithm? Select one:

a. The output attribute must be cateogrical.

b. All attribute values must be categorical.

c. All attributes must be numeric

d. Attribute values may be either categorical or numeric

Question 10 This data transformation technique works well when minimum and maximum values for a real-valued attribute are known. Select one:

a. z-score normalization

b. min-max normalization

c. logarithmic normalization

d. decimal scaling

Question 11 The number of iterations in apriori ___________ Select one:

a. increases with the size of the data

b. decreases with the increase in size of the data

c. increases with the size of the maximum frequent set

d. decreases with increase in size of the maximum frequent set Show Answer

Question 12 Which of the following are interestingness measures for association rules? Select one:

a. recall

b. lift

c. accuracy

d. compactness Show Answer

Question 13 Which one of the following is not a major strength of the neural network approach? Select one

: a. Neural network learning algorithms are guaranteed to converge to an optimal solution

b. Neural networks work well with datasets containing noisy data.

c. Neural networks can be used for both supervised learning and unsupervised clustering

d. Neural networks can be used for applications that require a time element to be included in the data Show Answer

Question 14 Find odd man out Select one:

a. K medoid

b. K mean

c. DBSCAN

d. PAM

Question 15 Given a frequent itemset L, If |L| = k, then there are Select one:

a. 2k – 1 candidate association rules

b. 2k candidate association rules

c. 2k – 2 candidate association rules

d. 2k -2 candidate association rules Show Answer

Question 16 . _________ is an example for case based-learning Select one:

a. Decision trees

b. Neural networks

c. Genetic algorithm

d. K-nearest neighbor Show Answer

Question 17 The average positive difference between computed and desired outcome values. Select one:

a. mean positive error

b. mean squared error

c. mean absolute error

d. root mean squared error Show Answer

Question 18 Frequent item sets is Select one:

a. Superset of only closed frequent item sets

b. Superset of only maximal frequent item sets

c. Subset of maximal frequent item sets

d. Superset of both closed frequent item sets and maximal frequent item sets Show Answer

a. 63

b. 30

c. 38

d. 70 Show Answer

a. 60

b. 40

c. 50

d. 30 Show Answer

Question 21 Which of the following is cluster analysis? Select one:

a. Simple segmentation

b. Grouping similar objects

c. Labeled classification

d. Query results grouping Show Answer

Question 22 A good clustering method will produce high quality clusters with Select one:

a. high inter class similarity

b. low intra class similarity

c. high intra class similarity

d. no inter class similarity Show Answer

Question 23 Which two parameters are needed for DBSCAN Select one:

a. Min threshold

b. Min points and eps

c. Min sup and min confidence

d. Number of centroids Show Answer

Question 24 Which statement is true about neural network and linear regression models? Select one:

a. Both techniques build models whose output is determined by a linear sum of weighted input attribute values.

b. The output of both models is a categorical attribute value.

c. Both models require numeric attributes to range between 0 and 1.

d. Both models require input attributes to be numeric. Show Answer

Question 25 In Apriori algorithm, if 1 item-sets are 100, then the number of candidate 2 item-sets are Select one:

a. 100

b. 4950

c. 200

d. 5000

Show Answer

Question 26 Significant Bottleneck in the Apriori algorithm is Select one:

a. Finding frequent itemsets

b. Pruning

c. Candidate generation

d. Number of iterations Show Answer

Question 27 The concept of core, border and noise points fall into this category? Select one:

a. DENCLUE

b. Subspace clustering

c. Grid based

d. DBSCAN Show Answer

Question 28 The correlation coefficient for two real-valued attributes is â€“0.85. What does this value tell you? Select one:

a. The attributes show a linear relationship

b. The attributes are not linearly related.

c. As the value of one attribute increases the value of the second attribute also increases.

d. As the value of one attribute decreases the value of the second attribute increases. Show Answer

Question 29 Machine learning techniques differ from statistical techniques in that machine learning methods Select one:

a. are better able to deal with missing and noisy data

b. typically assume an underlying distribution for the data

c. have trouble with large-sized datasets

d. are not able to explain their behavior. Show Answer

Question 30 The probability of a hypothesis before the presentation of evidence. Select one:

a. a priori

b. posterior

c. conditional

d. subjective Show Answer

Question 31 KDD represents extraction of Select one:

a. data

b. knowledge

c. rules

d. model Show Answer

Question 32 Which statement about outliers is true? Select one

: a. Outliers should be part of the training dataset but should not be present in the test data.

b. Outliers should be identified and removed from a dataset

. c. The nature of the problem determines how outliers are used

d. Outliers should be part of the test dataset but should not be present in the training data. Show Answer

Question 33 The most general form of distance is Select one:

a. Manhattan

b. Eucledian

c. Mean

d. Minkowski Show Answer

Question 34 Arbitrary shaped clusters can be found by using Select one:

a. Density methods

b. Partitional methods

c. Hierarchical methods

d. Agglomerative Show Answer

Question 35 Which Association Rule would you prefer Select one

: a. High support and medium confidence

b. High support and low confidence

c. Low support and high confidence

d. Low support and low confidence Show Answer

Question 36 With Bayes theorem the probability of hypothesis HÂ¾ specified by P(H) Â¾ is referred to as Select one:

a. a conditional probability

b. an a priori probability

c. a bidirectional probability

d. a posterior probability Show Answer

Question 37 In a Rule based classifier, If there is a rule for each combination of attribute values, what do you called that rule set R Select one:

a. Exhaustive

b. Inclusive

c. Comprehensive

d. Mutually exclusive Show Answer

Question 38 The apriori property means Select one

: a. If a set cannot pass a test, its supersets will also fail the same test

b. To decrease the efficiency, do level-wise generation of frequent item sets

c. To improve the efficiency, do level-wise generation of frequent item sets

d. If a set can pass a test, its supersets will fail the same test Show Answer

Question 39 If an item set ‘XYZ’ is a frequent item set, then all subsets of that frequent item set are Select one:

a. Undefined

b. Not frequent

c. Frequent

d. Can not say Show Answer

Question 40 Clustering is ___________ and is example of ____________learning Select one:

a. Predictive and supervised

b. Predictive and unsupervised

c. Descriptive and supervised

d. Descriptive and unsupervised Show Answer

: a. 0.0368

b. 0.0396

c. 0.0389

d. 0.0398 Show Answer

Question 42 Simple regression assumes a __________ relationship between the input attribute and output attribute. Select one:

a. quadratic

b. inverse

c. linear

d. reciprocal Show Answer

Question 43 Which of the following algorithm comes under the classification Select one:

a. Apriori

b. Brute force

c. DBSCAN

d. K-nearest neighbor Show Answer

Question 44 Hierarchical agglomerative clustering is typically visualized as? Select one:

a. Dendrogram

b. Binary trees

c. Block diagram

d. Graph Show Answer Question

45 The _______ step eliminates the extensions of (k-1)-itemsets which are not found to be frequent,from being considered for counting support Select one:

a. Partitioning

b. Candidate generation

c. Itemset eliminations

d. Pruning Show Answer

Question 46 To determine association rules from frequent item sets Select one:

a. Only minimum confidence needed

b. Neither support not confidence needed

c. Both minimum support and confidence are needed

d. Minimum support is needed Show Answer

Question 47 What is the final resultant cluster size in Divisive algorithm, which is one of the hierarchical clustering approaches? Select one:

a. Zero

b. Three

c. singleton

d. Two Show Answer

Question 48 If {A,B,C,D} is a frequent itemset, candidate rules which is not possible is Select one:

a. C –> A

b. D –>ABCD

c. A –> BC

d. B –> ADC Show Answer

Question 49 Which Association Rule would you prefer Select one:

a. High support and low confidence

b. Low support and high confidence

c. Low support and low confidence

d. High support and medium confidence Show Answer

Select one:

a. 0.0398

b. 0.0389

c. 0.0368

d. 0.0396 Show Answer

Unit 5:

1. What is true about Data Visualization?

2. Data can be visualized using?

A. graphs B. charts C. maps D. All of the above

3. Data visualization is also an element of the broader _____________.

A. deliver presentation architecture B. data presentation architecture C. dataset presentation architecture D. data process architecture

4. Which method shows hierarchical data in a nested format?

A. Treemaps B. Scatter plots C. Population pyramids D. Area charts

5. Which is used to inference for 1 proportion using normal approx?

A. fisher.test() B. chisq.test() C. Lm.test() D. prop.test()

6. Which is used to find the factor congruence coefficients?

A. factor.mosaicplot B. factor.xyplot C. factor.congruence D. factor.cumsum

7. Which of the following is tool for checking normality?

A. qqline() B. qline() C. anova() D. lm()

8. Which of the following is false?

9. Common use cases for data visualization include?

A. Politics B. Sales and marketing C. Healthcare D. All of the above

10. Which of the following plots are often used for checking randomness in time series?

A. Autocausation B. Autorank C. Autocorrelation D. None of the above

11. Which are pros of data visualization?

A. It can be accessed quickly by a wider audience. B. It can misrepresent information C. It can be distracting D. None Of the above

12. Which are cons of data visualization?

A. It conveys a lot of information in a small space. B. It makes your report more visually appealing.

C. visual data is distorted or excessively used. D. None Of the above

13. Which of the intricate techniques is not used for data visualization?

A. Bullet Graphs B. Bubble Clouds C. Fever Maps D. Heat Maps

14. Which one of the following is most basic and commonly used techniques?

A. Line charts B. Scatter plots C. Population pyramids D. Area charts

15. Which is used to query and edit graphical settings?

A. anova() B. par() C. plot() D. cum()

16. Which of the following method make vector of repeated values?

A. rep() B. data() C. view() D. read()

17. Who calls the lower level functions lm.fit?

A. lm() B. col.max

C. par D. histo

18. Which of the following lists names of variables in a data.frame?

A. par() B. names() C. barchart() D. quantile()

19. Which of the folllowing statement is true?

20. ________is used for density plots?

A. par B. lm C. kde D. C

Answer key:

Unit :1

Ans : D

Explanation: Data Analysis is a process of inspecting, cleaning, transforming and modelling data with the goal of discovering useful information, suggesting conclusions and supporting decision-making.

2. Ans : B

Explanation: Predictive Analytics is major data analysis approaches not Predictive Intelligence.

3. Ans : A

Explanation: In data analysis, two main statistical methodologies are used Descriptive statistics and Inferential statistics.

4. Ans : C

Explanation: In descriptive statistics, data from the entire population or a sample is summarized with numerical descriptors.

5. Ans : D

Explanation: Data Analysis is defined by the statistician John Tukey in 1961 as "Procedures for analyzing data.

6. Ans : A

Explanation: answering yes/no questions about the data (hypothesis testing)

7. Ans : A

Explanation: The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

8. Ans : D

Explanation: The branch of statistics which deals with development of particular statistical methods is classified as applied statistics.

Ans : C

Explanation: modeling relationships within the data (E.g. regression analysis).

10 Ans : A

Explanation: Text Data Mining is the process of deriving high-quality information from text.

11 personalization

12.

CRM analytics

13.

business intelligence

14. database marketing

15. hosted CRM

16. All of the above

17. Cascalog

18. All of these

19. All the above

20. All of the above

21.

Project Prism

22.

Recall

UNIT 2:

1. b

2. c

3. A

4. c

5. a

6. b

7. a

8. c

9. B

10. D

11. A

12. B

13. D

14.A

15. D

16. A

17. B

18. C

19. A broad term, the most commonly used technique for doing factor analysis.

20. C

21. Answer: c Explanation: With fuzzy logic set membership is defined by certain value. Hence it could have many values to be in the set.

Unit 4:

Unit 5:

1. Ans : D

Ans : D

Explanation: Data visualization is a graphical representation of quantitative information and data by using visual elements like graphs, charts, and maps.

3. Ans : B

Ans : A

Explanation: Treemaps are best used when multiple categories are present, and the goal is to compare different parts of a whole.

Ans : D

Explanation: prop.test() is used to inference for 1 proportion using normal approx.

6. Ans : C

Explanation: factor.congruence is used to find the factor congruence coefficients.

7. Ans : A

Explanation: qqnorm is another tool for checking normality.

8. Ans : C

Explanation: Data visualization decrease the insights andtake solwer decisions is false statement.

9. Ans : D

Explanation: All option are Common use cases for data visualization.

10. Ans : C

Explanation: If the time series is random, such autocorrelations should be near zero for any and all timelag separations.

11. Ans : A

Explanation: Pros of data visualization : it can be accessed quickly by a wider audience.

12.

Ans : C

Explanation: It can be distracting : if the visual data is distorted or excessively used.

13. Ans : C

Explanation: Fever Maps is not is not used for data visualization instead of that Fever charts is used.

14. Ans : A

Explanation: Line charts. This is one of the most basic and common techniques used. Line charts display how variables can change over time.

15. Ans : B

Explanation: par() is used to query and edit graphical settings.

16 Ans : B

Explanation: data() load (often into a data.frame) built-in dataset.

17. Ans : A

Explanation: lm calls the lower level functions lm.fit.

18.

Ans : D

Explanation: names function is used to associate name with the value in the vector.

19.

Ans : D

Explanation: All option are correct.

20. Ans : C

Explanation: kde is used for density plots.

MCQ for UNIT 5

2. Which of the following genres does Hadoop produce? a) Distributed file system b) JAX-RS c) Java Message Service d) Relational Database Management System

3. Which of the following platforms does Hadoop run on? a) Bare metal b) Debian c) Cross-platform d) Unix-like

4. Hadoop achieves reliability by replicating the data across multiple hosts and hence does not require ________ storage on hosts. a) RAID b) Standard RAID levels c) ZFS d) Operating system

12. All of the following accurately describe Hadoop, EXCEPT ____________ a) Open-source b) Real-time c) Java-based d) Distributed computing approach

13. __________ can best be described as a programming model used to develop Hadoopbased applications that can process massive amounts of data. a) MapReduce b) Mahout

c) Oozie d) All of the mentioned

14. __________ has the world’s largest Hadoop cluster. a) Apple b) Datamatics c) Facebook d) None of the mentioned

15. Facebook Tackles Big Data With _______ based on Hadoop. a) ‘Project Prism’ b) ‘Prism’ c) ‘Project Big’ d) ‘Project Data’

16. ________ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets. a) Pig Latin b) Oozie c) Pig d) Hive

18. Hive also support custom extensions written in ____________ a) C# b) Java c) C d) C++

20. ___________ is general-purpose computing model and runtime system for distributed data analytics. a) Mapreduce b) Drill

c) Oozie d) None of the mentioned

21. The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to ____________ a) SQL b) JSON c) XML d) All of the mentioned

22. _______ jobs are optimized for scalability but not latency. a) Mapreduce b) Drill c) Oozie d) Hive

23. A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker. a) MapReduce b) Mapper c) TaskTracker d) JobTracker

25. ___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results. a) Maptask b) Mapper c) Task execution d) All of the mentioned

26. _________ function is responsible for consolidating the results produced by each of the Map() functions/tasks. a) Reduce b) Map c) Reducer d) All of the mentioned

27. ________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer.

a) Hadoop Strdata b) Hadoop Streaming c) Hadoop Stream d) None of the mentioned

28. __________ maps input key/value pairs to a set of intermediate key/value pairs. a) Mapper b) Reducer c) Both Mapper and Reducer d) None of the mentioned

29. The number of maps is usually driven by the total size of ____________ a) inputs b) outputs c) tasks d) None of the mentioned

30. Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster. a) MapReduce b) Map c) Reducer d) All of the mentioned

31. A ________ serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication

33. HDFS works in a __________ fashion. a) master-worker b) master-slave c) worker/slave d) all of the mentioned

36. The need for data replication can arise in various scenarios like ____________ a) Replication Factor is changed b) DataNode goes down c) Data Blocks get corrupted d) All of the mentioned

37. ________ is the slave/worker node and holds the user data in the form of Data Blocks. a) DataNode b) NameNode c) Data block d) Replication

38. HDFS provides a command line interface called __________ used to interact with HDFS. a) “HDFS Shell” b) “FS Shell” c) “DFS Shell” d) None of the mentioned

39. HDFS is implemented in _____________ programming language. a) C++ b) Java c) Scala d) None of the mentioned

40. For YARN, the ___________ Manager UI provides host and port information. a) Data Node b) NameNode

c) Resource d) Replication

41. During start up, the ___________ loads the file system state from the fsimage and the edits log file. a) DataNode b) NameNode c) ActionNode d) None of the mentioned

43. Using Hadoop Archives in __________ is as easy as specifying a different input filesystem than the default file system. a) Hive b) Pig c) MapReduce d) All of the mentioned

44. Pig operates in mainly how many nodes? a) Two b) Three c) Four d) Five

46. You can run Pig in batch mode using __________ a) Pig shell command b) Pig scripts c) Pig options d) All of the mentioned

47. Pig Latin statements are generally organized in one of the following ways? a) A LOAD statement to read data from the file system b) A series of “transformation” statements to process the data

c) A DUMP statement to view results or a STORE statement to save the results d) All of the mentioned

49. Which of the following function is used to read data in PIG? a) WRITE b) READ c) LOAD d) None of the mentioned

50. You can run Pig in interactive mode using the ______ shell. a) Grunt b) FS c) HDFS d) None of the mentioned

51. HBase is a distributed ________ database built on top of the Hadoop file system. a) Column-oriented b) Row-oriented c) Tuple-oriented d) None of the mentioned

53. HBase is ________ defines only column families. a) Row Oriented b) Schema-less c) Fixed Schema d) All of the mentioned

54. Apache HBase is a non-relational database modeled after Google’s _________ a) BigTop b) Bigtable

c) Scanner d) FoundationDB

56. The _________ Server assigns regions to the region servers and takes the help of Apache ZooKeeper for this task. a) Region b) Master c) Zookeeper d) All of the mentioned

57. Which of the following command provides information about the user? a) status b) version c) whoami d) user

58. Which of the following command does not operate on tables? a) enabled b) disabled c) drop d) all of the mentioned

59. _________ command fetches the contents of a row or a cell. a) select b) get c) put d) none of the mentioned

60. HBaseAdmin and ____________ are the two important classes in this package that provide DDL functionalities. a) HTableDescriptor b) HDescriptor c) HTable d) HTabDescriptor

61. Which of the following is not a NoSQL database? a) SQL Server b) MongoDB

c) Cassandra d) None of the mentioned

63. Which of the following is a NoSQL Database Type? a) SQL b) Document databases c) JSON d) All of the mentioned

64. Which of the following is a wide-column store? a) Cassandra b) Riak c) MongoDB d) Redis

66. Most NoSQL databases support automatic __________ meaning that you get high availability and disaster recovery. a) processing b) scalability c) replication d) all of the mentioned

67. Which of the following are the simplest NoSQL databases? a) Key-value b) Wide-column c) Document d) All of the mentioned

68. ________ stores are used to store information about networks, such as social connections. a) Key-value b) Wide-column c) Document d) Graph

69. NoSQL databases is used mainly for handling large volumes of ______________ data. a) unstructured b) structured c) semi-structured d) all of the mentioned

71. R functionality is divided into a number of ________ a) Packages b) Functions c) Domains d) Classes

72. Which Package contains most fundamental functions to run R? a) root b) child c) base d) parent

74. Which of the following is a base package for R language? a) util b) lang

c) tools d) spatial

75. Which of the following is “Recommended” package in R? a) util b) lang c) stats d) spatial

76. What is the output of getOption(“defaultPackages”) in R studio? a) Installs a new package b) Shows default packages in R c) Error d) Nothing will prin

77. Which of the following is used for Statistical analysis in R language? a) RStudio b) Studio c) Heck d) KStudio

78. In R language, a vector is defined that it can only contain objects of the ________ a) Same class b) Different class c) Similar class d) Any class

79. A list is represented as a vector but can contain objects of ___________ a) Same class b) Different class c) Similar class d) Any class

80. How can we define ‘undefined value’ in R language? a) Inf b) Sup c) Und d) NaN

81. What is NaN called? a) Not a Number b) Not a Numeric c) Number and Number d) Number a Numeric

82. How can we define ‘infinity’ in R language? a) Inf b) Sup c) Und d) NaN

83. Which one of the following is not a basic datatype? a) Numeric b) Character c) Data frame d) Integer

84. Matrices can be created by row-binding with the help of the following function. a) rjoin() b) rbind() c) rowbind() d) rbinding()

85. What is the function used to test objects (returns a logical operator) if they are NA? a) is.na() b) is.nan() c) as.na() d) as.nan()

86. What is the function used to test objects (returns a logical operator) if they are NaN? a) as.nan() b) is.na() c) as.na() d) is.nan()

87. What is the function to set column names for a matrix? a) names() b) colnames() c) col.names() d) column name cannot be set for a matrix

88. The most convenient way to use R is at a graphics workstation running a ________ system. a) windowing b) running c) interfacing d) matrix

89. Point out the wrong statement? a) Setting up a workstation to take full advantage of the customizable features of R is a

straightforward thing b) q() is used to quit the R program c) R has an inbuilt help facility similar to the man facility of UNIX d) Windows versions of R have other optional help systems also

91. Elementary commands in R consist of either _______ or assignments. a) utilstats b) language c) expressions d) packages

93. __________ function is used to watch for all available packages in library. a) lib() b) fun.lib() c) libr() d) library()

94. Attributes of an object (if any) can be accessed using the ______ function. a) objects() b) attrib() c) attributes() d) obj()

95. R objects can have attributes, which are like ________ for the object. a) metadata b) features c) expression d) dimensions

96. ________ generate random Normal variates with a given mean and standard deviation. a) dnorm b) rnorm

c) pnorm d) rpois

98. ______ evaluate the cumulative distribution function for a Normal distribution. a) dnorm b) rnorm c) pnorm d) rpois

99. _______ generate random Poisson variates with a given rate. a) dnorm b) rnorm c) pnorm d) rpois

101. _________ is the most common probability distribution to work with. a) Gaussian b) Parametric c) Paradox d) Simulation

103. _______ function is used to simulate binary random variables. a) dnorm b) rbinom() c) binom() d) rpois

105. _______ grammar makes a clear distinction between your data and what gets displayed on the screen or page. a) ggplot1 b) ggplot2 c) d3.js d) ggplot3

107. Which of the following cuts numeric vector into intervals of equal length? a) cut_interval b) cut_time c) cut_number d) cut_date

108. Which of the following is a plot to investigate the order in which observations were recorded? a) ggplot b) ggsave c) ggpcp d) ggorder

109. ________ is used for translating between qplot and base graphics. a) translate_qplot_base b) translate_qplot_gpl c) translate_qplot_lattice d) translate_qplot_ggplot

110. Which of the following is discrete state calculator? a) discrete_scale b) ggpcp c) ggfluctuation d) ggmissing

111. Which of the following creates fluctuation plot? a) ggmissplot b) ggmissing c) ggfluctuation d) ggpcp

112. __________ create a complete ggplot appropriate to a particular data type. a) autoplot b) is.ggplot c) printplot d) qplot_ggplot

113. Which of the following creates a new ggplot plot from a data frame? a) qplot_ggplot b) ggplot.data.frame c) ggfluctuation d) ggmissplot

Department of Information Technology

DATA ANALYTICS – KIT601 – Question Bank

UNIT-1

1. Data originally collected in the process of investigation are known as a) Foreign data b) Primary data c) Third data d) Secondary data e) None of these

2. Statistical enquiry means 1. It is science for knowledge 2. Search for knowledge 3. Collection of anything 4. Search for knowledge with the help of statistical methods. e) None of these

4. What is Secondary data? a) Data collected in the process of investigation b) Data collected from some other agency c) Data collected from questionnaire of a person d) Both A & B e) None of these

5. What is information? a) Raw facts b) Processed data c) Understanding facts d) Knowing action on data e) None of these

6. Data about rocks is an example of a) Time dependent data b) Time Independent data c) Location dependent data d) Location independent data e) None of these

7. Range on temperature scale is termed as a) Nominal data b) Ordinal data

Department of Information Technology

c) Interval data d) Ratio data e) None of these

8. Data in XML and CSV format is an example of a) Structure data b) Un-structure data c) Semi-structure data d) Both A & B e) None of these

9. Which is not the characteristic of data a) Accuracy b) Consistency c) Granularity d) Redundant e) None of these

11. Which is not the V in BIG data a) Volume b) Veracity c) Vigor d) Velocity e) None of these

13. Cloudera is a product of a) Microsoft b) Apache c) Google d) Facebook e) None of these

14. What is not true about MPP architecture a) Tightly coupled nodes b) High speed connection among nodes c) Disk are not shared

Department of Information Technology

d) Uses lot of processors e) None of these

15. The process of organizing and summarizing data in an easily readable format to communicate important information is known as a) Analysis b) Reporting c) Clustering d) Mining e) None of these

16. Out of the following which is not a type of report a) Canned b) Dashboard c) Ad hoc response d) Alerts e) None of these

17. Data Analysis is a process of? a) inspecting data b) cleaning data c) transforming data d) All of above e) None of these

18. Which of the following is not a major data analysis approaches? a) Data Mining b) Predictive Intelligence c) Business Intelligence d) Text Analytics e) None of these

19. How many main statistical methodologies are used in data analysis? a) 2 b) 3 c) 4 d) 5 e) None of these

21. __________ may be defined as the data objects that do not comply with the general behavior or model of the data available. a) Outlier Analysis b) Evolution Analysis

Department of Information Technology

c) Prediction d) Classification e) None of these

22. What is the use of data cleaning? a) to remove the noisy data b) correct the inconsistencies in data c) transformations to correct the wrong data. d) All of the above e) None of these

24. What are the main components of Big Data? a) MapReduce b) HDFS c) HBASE d) All of these e) None of these

25. ———- data that depends on data model and resides in a fixed field within a record. a) Structured data b) Un-Structured data c) Semi-Structured data d) Scattere e) None of these

27. —————– is an example of human generated unstructured data. a) YouTube data b) Satellite data c) Sensor data d) Seismic imagery data e) None of these

Department of Information Technology

28. Height is an example of which type of attribute a) Nominal b) Binary c) Ordinal d) Numeric e) None of these

29. ————-type of analytics describes what happened in past a) Descriptive b) Prescriptive c) Predictive d) Probability e) None of these

30. ————– data does not fits into a data model due to variations in contents a) Structured data b) Un - Structured data c) Semi Structured data d) Both B & C e) None of these

Department of Information Technology

UNIT-2

31. A and B are two events. If P(A, B) decreases while P(A) increases, which of the following is true? a) P(A|B) decreases b) P(B|A) decreases c) P(B) decreases d) All of above e) None of these

36. The results that we get after we apply Bayesian Theorem to a problem are, a) 100% accurate b) Estimated values c) Wrong values d) Only positive values e) None of these

Department of Information Technology

Options:

a) Only iv. b) All i., ii., iii. and iv. c) ii. and iv. d) Only ii. e) None of these

42. How the Bayesian network can be used to answer any query? a) Full distribution b) Joint distribution

Department of Information Technology

c) Partial distribution d) All of the mentioned above e) None of these

43. Which of the following methods do we use to find the best fit line for data in Linear Regression? a) Least Square Error b) Maximum Likelihood c) Logarithmic Loss d) Both A and B e) None of these

44. Linear Regression is a ..................... machine learning algorithm. a) supervised b) unsupervised c) reinforcement d) Both A & B e) None of these

47. Which of the following methods do we use to best fit the data in Logistic Regression? a) Least Square Error b) Maximum Likelihood c) Jaccard distance d) Both A & B e) None of these

Department of Information Technology

49. A decision tree is also known as a) general tree b) binary tree c) prediction tree d) fuzzy tree e) None of these

50. The confusion matrix is a useful tool for analyzing a) Regression b) Classification c) Sampling d) Cross Validation e) None of these

51. In regression the independent variable is also called as ———– a) Regressor b) Continuous c) Regressand d) Estimated e) None of these

53. Which of the following is used as attribute selection measure in decision tree algorithms? a) Information Gain b) Posterior probability c) Prior probability d) Support e) None of these

54. ———- is unsupervised technique aiming to divide a multivariate dataset into clusters or groups. a) KNN b) SVM c) Regression d) Cluster Analysis e) None of these

55. A perfect negative correlation is signified by ————- a) 1 b) -1 c) 0 d) 2

Department of Information Technology

e) None of these

56. ———— rule mining is a technique to identify underlying relations between different items. a) Classification b) Regression c) Clustering d) Association e) None of these

57. ———– is supervised machine learning algorithm outputs an optimal hyperplane for given labeled training data a) KNN b) SVM c) Regression d) Decision Tree e) None of these

59. Which of the following is not a type of clustering algorithm? a) Density clustering b) K-Means clustering c) Centroid clustering d) Simple clustering e) None of these

60. —— answers the questions like ” How can we make it happen?” a) Descriptive b) Prescriptive c) Predictive d) Probability e) None of these

Department of Information Technology

UNIT-3

62. When do we use Manhattan distance in data mining? a) Dimension of the data decreases b) Dimension of the data increases c) Under fitting d) Moderate size of the dimensions e) None of these

64. Apriori algorithm uses breadth first search and ————structure to count candidate item sets efficiently. a) Decision tree b) Hash Tree c) Red-Black Tree d) AVL Tree e) None of these

65. To determine basic salary of an employee when his qualification is given is a ———– problem a) Correlation b) Regression c) Association d) Qualitative e) None of these

66. ————the step is performed by data scientist after acquiring the data. a) Data Cleansing b) Data Integration c) Data Replication d) Data loading e) None of these

67. ———– is an indication of how often the rule has been found to be true in association rule mining. a) Confidence

Department of Information Technology

b) Support c) Lift d) Accuracy e) None of these

69. A Bloom filter guarantees no a) false positives b) false negatives c) false positives and false negatives d) false positives or false negatives, depending on the Bloom filter type e) None of these

73. Which algorithm should be used to approximate the number of distinct elements in a data stream? a) Misra-Gries b) Alon-Matias-Szegedy c) DGIM d) Apriori e) None of these

75. ETL stands for ________________ a) Extraction transformation and loading b) Extract Taken Lend c) Enterprise Transfer Load d) Entertainment Transference Load e) None of these

76. Which of the following is not a major data analysis approaches? a) Data Mining b) Predictive Intelligence c).Business Intelligence d) Text Analytics e) None of these

78. Data Analysis is defined by the statistician? a) William S. b)Hans Peter Luhn c) Gregory Piatetsky-Shapiro d) John Tukey e)None of these

80 Which of the following emphases on the discovery of earlier properties that are not known on the data? a) Machine Learning b). Big Data c). Data wrangling d). Data mining e)None of these

Department of Information Technology

82 A Bloom filter guarantees no a)false positives b)false negatives c)false positives and false negatives d)false positives or false negatives, depending on the Bloom filter e)None of these

85 In Flajolet-Martin algorithm if the stream contains n elements with m of them unique, this algorithm runs in a) O(n) time b) constant time c) O(2n) time d) O(3n)time e)None of these

Department of Information Technology

UNIT 4

91 Which of the following clustering type has characteristic shown in the below figure?

a) Exploratory b) Inferential c) Causal d) Hierarchical Clustering e)None of these

92 Which of the following dimension type graph is shown in the below figure?

a) one-dimensional b) two-dimensional c) three-dimensional d) four-dimensional e)None of these

93 Which of the following gave rise to need of graphs in data analysis? a)Data visualization b) Communicating results

Department of Information Technology

c) Decision making d) All of the mentioned e)None of these

94Which of the following is characteristic of exploratory graph? a) Made slowly b) Axes are not cleaned up c) Color is used for personal information d) All of the mentioned e)None of these

95Color and shape are used to add dimensions to graph data. a)True b) False c)Dilemma d)Incorrect Statement e)None of these

96.Which of the following information is not given by five-number summary? a) Mean b) Median c) Mode d) All of the mentioned e)None of these

97.Which of the following is also referred to as overlayed 1D plot? a)lattice b) barplot c) gplot d) all of the mentioned e)None of these

98.Spinning plots can be used for two dimensional data. a)True b) False c)Incorrect d)Not Sure e)None of these

100 Which of the following clustering technique is used by K- Means Algorithm a)HierarchicalTechnique b)Partitional technique c)Divisive

Department of Information Technology

d) Agglomerative e)None of these

101.SON algorithm is also known as a)PCY Algorithm b MultistageAlgorithm c)Multihash Algorithm d)Partition Algorithm D e)None of these

102. Which technique is used to filter unnecessary itemset in PCY algorithm a )Association Rule b)Hashing Technique c)Data Mining d)Market basket B e)None of these

103 In association rule, which of the following indicates the measure of how frequently the items occur in a dataset ? a)Support b)Confidence c)Basket d)Itemset e)None of these

104.Which term indicated the degree of corelation in dataset between X and Y, if the given association rule given is X-->Y a)Confidence b)Monotonicity c)Distinct d)Hashing e)None of these

105.During start up, the ___________ loads the file system state from the fsimage and the edits log file. a) DataNode b) NameNode c) ActionNode d) Data Action Node e)None of these

Department of Information Technology

107________ is the slave/worker node and holds the user data in the form of Data Blocks. a) DataNode b) NameNode c) Data block d) Replication e)None of these

110 CLIQUE stands for ? a) Clustering in QUEst b) Common in Quest c)Calculate in Quest d)Click in Quest e)None of these

111What is approaches for high dimensional data clustering a)Subspace clustering b)Projected clustering and Biclustering. c) Data Clustering d)Space Clustering e)None of these

112Applications of frequent itemset analysis a) Related concepts ,Plagiarism , Biomarkers b)CLUSTERING c)Design d)Operation e)None of these

113. k-means is a ………..based algorithm or distance based algorithm where we calculate the distances to assign a point to a cluster a) centroid b)Distance c)Neuron d)Dendron e) None of these

Department of Information Technology

114--------is an algorithm for frequent item set mining and association rule learning over relational databases a)Confidence b) Apriori c)Disadvantage d)Market basket e) None of these

117 Different methods for storing itemset count in main memory. a)The triangular matrix method b)The triples method c)Angular method d)Square Method e) None of these

118 ------used to prune non- promising cells and to improve efficiency. a)market basket b)frequent itemset c)Support d) aprioriproperty e) None of these

Department of Information Technology

Unit 5 121.Input to the is the sorted output of the mappers. a) Reducer b) Mapper c) Shuffle d) All of the above e)None of these

123 The output of the is not sorted in the Mapreduce framework for Hadoop. (A) Mapper (B) Cascader (C) Scalding (D) None of the above e) None of these

124: Which of the following phases occur simultaneously? (A) Reduce and Sort (B) Shuffle and Sort (C) Shuffle and Map d)sort and ruduce e) None of these

125.A ________ serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication e) None of these

126.HDFS works in a __________ fashion. a) master-worker b) master-slave c) worker/slave d) all of the mentioned e) None of these

127.________ NameNode is used when the Primary NameNode goes down. a) Rack b) Data c) Secondary d) None of the mentioned e) None of these

Department of Information Technology

132HDFS is implemented in _____________ language. a) C b)Perl c)Python d)Java e)none of these

133 The default block size in hadoop is ______. a)16MB b) 32MB c)64MB d)128MB e) none of these

134. ____ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data. a)MapReduce b)Mahout c)Oozie d)Hbase e)None of these

Department of Information Technology

135 Mapper and Reducer implementations can use the to report progress or just indicate that they are alive. (A) Partitioner (B) OutputCollector (C) Reporter (D) All of the above e) None of these

137 A serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication e) None of these

138 HDFS works in a fashion. (A)a)masterworker b) master-slave c) worker/slave d) All of the above e) None of these

139 NameNode is used when the Primary NameNode goes down. a) Rack b) Data c) Secondary d) None e)none of these

140 HDFS is implemented in programming language. ( a) C++ b) Java c) Scala d) None e) None of these

141 Hadoop developed by _______________ a)Larry Page b)Doug Cutting c)Mark d)Bill Gates e) None of these

Department of Information Technology

142.The MapReduce algorithm contains two important tasks, namely __________. a)mapped, reduce b)mapping, Reduction c) Map, Reduction d) Map, Reduce e)None of these

143.mapper and reducer classes extends classes from the package a) org.apache.hadd op.mapreduce b)apache.hadoop c)org.mapreduce d)hadoop.mapreduce e) None of these

144.HDFS inherited from ------------- file system. a)Yahoo b) FTFS c)Google d)Rediff e) none of these

145 NameNode is used when the Primary NameNode goes down. a) Rack b) Data c) Secondary d) Primary e) None of these

146 HDFS works in a fashion. a) master-worker b) master-slave c) worker/slave d) All of the above e) None of these

147: A serves as the master and there is only one NameNode per cluster. a) Data Node b) NameNode c) Data block d) Replication e) None of these

148 HDFS provides a command line interface called used to interact with HDFS. a) HDFS Shell b) FS Shell c) DFSA Shell d) NO shell e) None of these

Department of Information Technology

149 is the slave/worker node and holds the user data in the form of Data Blocks. a) DataNode b) NameNode c) Data lock d) Replication e) None of these

150. is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution. a) Map Parameters b) JobConf c) MemoryConf d) All of the above e) None of these

Department of Information Technology

Data Analytics KIT-601 Answer key

UNIT-1 UNIT-2 UNIT-3 UNIT-4 UNIT-5 1-b 31-b 61-b 91-d 121-a 2-d 32-b 62-b 92-b 122-b 3-d 33-b 63-d 93-d 123-d 4-b 34-c 64-b 94-c 124-a 5-b 35-a 65-d 95-a 125-b 6-c 36-b 66-a 96-c 126-a 7-c 37-b 67-a 97-a 127-c 8-c 38-a 68-b 98-a 128-a 9-d 39-d 69-b 99-a 129-d 10-a 40-d 70-a 100-b 130-c 11-c 41-d 71-b 101-d 131-b 12-a 42-b 72-b 102-b 132-d 13-b 43-a 73-e 103-a 133-c 14-a 44-a 74-d 104-a 134-a 15-b 45-b 75-a 105-b 135-c 16-c 46-d 76-b 106-a,d 136-b 17-d 47-b 77-a,b 107-a 137-b 18-b 48-a 78-d 108-b 138-a 19-a 49-c 79-b 109-a 139-c 20-c 50-b 80-d 110-a 140-b 21-a 51-a 81-b 111-a,b 141-b 22-d 52-d 82-b 112-a 142-d 23-d 53-a 83-c,d 113-a 143-a 24-d 54-d 84-a,b 114-b 144-c 25-a 55-c 85-a 115-c,d 145-c 26-b 56-d 86-b 116-c 146-b 27-a 57-b 87-a,b 117-a,b 147-b 28-d 58-b 88-c 118-b 148-b 29-a 59-d 89-a,d 119-d 149-a 30-b 60-b 90-c 120-a 150-b

***************Data Analytics MCQs Set - 1***************

1. The branch of statistics which deals with development of particular statistical methods

is classified as

1. industry statistics

2. economic statistics

3. applied statistics

4. applied statistics

Answer: applied statistics

2. Which of the following is true about regression analysis?

1. answering yes/no questions about the data

2. estimating numerical characteristics of the data

3. modeling relationships within the data

4. describing associations within the data

Answer: modeling relationships within the data

3. Text Analytics, also referred to as Text Mining?

1. True Join:- https://t.me/AKTU_Notes_Books_Quantum

2. False

3. Can be true or False

4. Can not say

Answer: True

4. What is a hypothesis?

1. A statement that the researcher wants to test through the data collected in a study.

2. A research question the results will answer.

3. A theory that underpins the study.

4. A statistical method for calculating the extent to which the results could have happened by

chance.

Answer: A statement that the researcher wants to test through the data collected in a study.

5. What is the cyclical process of collecting and analysing data during a single research

study called?

1. Interim Analysis

2. Inter analysis

3. inter item analysis

4. constant analysis

Answer: Interim Analysis

6. The process of quantifying data is referred to as ____ Join:- https://t.me/AKTU_Notes_Books_Quantum

1. Topology

2. Digramming

3. Enumeration

4. coding

Answer: Enumeration

7. An advantage of using computer programs for qualitative data is that they _

1. Can reduce time required to analyse data (i.e., after the data are transcribed)

2. Help in storing and organising data

3. Make many procedures available that are rarely done by hand due to time constraints

4. All of the above

Answer: All of the Above

8. Boolean operators are words that are used to create logical combinations.

1. True

2. False

Answer: True

9. ______ are the basic building blocks of qualitative data.

1. Categories Join:- https://t.me/AKTU_Notes_Books_Quantum

2. Units

3. Individuals

4. None of the above

Answer: Categories

10. This is the process of transforming qualitative research data from written interviews or field notes into typed text.

1. Segmenting

2. Coding

3. Transcription

4. Mnemoning

Answer: Transcription

11. A challenge of qualitative data analysis is that it often includes data that are unwieldy and complex; it is a major challenge to make sense of the large pool of data.

1. True

2. False

Answer: True

12. Hypothesis testing and estimation are both types of descriptive statistics.

1. True

2. False Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: False

13. A set of data organised in a participants(rows)-by-variables(columns) format is known as a “data set.”

1. True

2. False

Answer: True

14. A graph that uses vertical bars to represent data is called a ___

1. Line graph

2. Bar graph

3. Scatterplot

4. Vertical graph

Answer: Bar graph

15. ____ are used when you want to visually examine the relationship between two

quantitative variables.

1. Bar graph

2. pie graph

3. line graph

4. Scatterplot Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: Scatterplot

16. The denominator (bottom) of the z-score formula is

1. The standard deviation

2. The difference between a score and the mean

3. The range

4. The mean

Answer: The standard deviation

17. Which of these distributions is used for a testing hypothesis?

1. Normal Distribution

2. Chi-Squared Distribution

3. Gamma Distribution

4. Poisson Distribution

Answer: Chi-Squared Distribution

18. A statement made about a population for testing purpose is called?

1. Statistic

2. Hypothesis

3. Level of Significance

4. Test-Statistic Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: Hypothesis

19. If the assumed hypothesis is tested for rejection considering it to be true is called?

1. Null Hypothesis

2. Statistical Hypothesis

3. Simple Hypothesis

4. Composite Hypothesis

Answer: Null Hypothesis

20. If the null hypothesis is false then which of the following is accepted?

1. Null Hypothesis

2. Positive Hypothesis

3. Negative Hypothesis

4. Alternative Hypothesis.

Answer: Alternative Hypothesis.

21. Alternative Hypothesis is also called as?

1. Composite hypothesis

2. Research Hypothesis

3. Simple Hypothesis

4. Null Hypothesis Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: Research Hypothesis

******** Join:- https://t.me/AKTU_Notes_Books_Quantum ********

*************** Data Analytics MCQs Set – 2 ***************

1. What is the minimum no. of variables/ features required to perform clustering?

1.0

2.1

3.2

4.3

Answer: 1

2. For two runs of K-Mean clustering is it expected to get same clustering results?

1. Yes

2. No

Answer: No

3. Which of the following algorithm is most sensitive to outliers? Join:- https://t.me/AKTU_Notes_Books_Quantum

1. K-means clustering algorithm

2. K-medians clustering algorithm

3. K-modes clustering algorithm

4. K-medoids clustering algorithm

Answer: K-means clustering algorithm

4. The discrete variables and continuous variables are two types of

1. Open end classification

2. Time series classification

3. Qualitative classification

4. Quantitative classification

Answer: Quantitative classification

5. Bayesian classifiers is

1. A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.

2. Any mechanism employed by a learning system to constrain the search space of a hypothesis

4. None of these

Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.

6. Classification accuracy is

1. A subdivision of a set of examples into a number of classes

2. Measure of the accuracy, of the classification of a concept that is given by a certain theory

3. The task of assigning a classification to a set of examples

4. None of these

Answer: Measure of the accuracy, of the classification of a concept that is given by a certain theory

7. Euclidean distance measure is

1. A stage of the KDD process in which new data is added to the existing selection.

2. The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them

3. The distance between two points as calculated using the Pythagoras theorem

4. none of above

Answer: The distance between two points as calculated using the Pythagoras theorem

8. Hybrid is

1. Combining different types of method or information Join:- https://t.me/AKTU_Notes_Books_Quantum

2. Approach to the design of learning algorithms that is structured along the lines of the theory of evolution.

3. Decision support systems that contain an information base filled with the knowledge of an expert formulated in terms of if-then rules.

4. none of above

Answer: Combining different types of method or information

9. Decision trees use , in that they always choose the option that seems the best available at that moment.

1. Greedy Algorithms

2. divide and conquer

3. Backtracking

4. Shortest path algorithm

Answer: Greedy Algorithms

10. Discovery is

1. It is hidden within a database and can only be recovered if one is given certain clues (an example IS encrypted information).

2. The process of executing implicit previously unknown and potentially useful information from data

3. An extremely complex molecule that occurs in human chromosomes and that carries genetic

information in the form of genes.

4. None of these

Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: The process of executing implicit previously unknown and potentially useful information from data

11. Hidden knowledge referred to

1. A set of databases from different vendors, possibly using different database paradigms

2. An approach to a problem that is not guaranteed to work but performs well in most cases

3. Information that is hidden in a database and that cannot be recovered by a simple SQL query.

4. None of these

Answer: Information that is hidden in a database and that cannot be recovered by a simple SQL query.

12. Decision trees cannot handle categorical attributes with many distinct values, such as country codes for telephone numbers.

1. True

2. False

Answer: False

15. CNMICHMENT IS

1. A stage of the KDD process in which new data is added to the existing selection

2. The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them

3. The distance between two points as calculated using the Pythagoras theorem.

4. None of these Join:- https://t.me/AKTU_Notes_Books_Quantum

Answer: A stage of the KDD process in which new data is added to the existing selection

14. are easy to implement and can execute efficiently even without prior knowledge of the data, they are among the most popular algorithms for classifying text documents.

1. 1D3

2. Naive Bayes classifiers

3. CART

4. None of above

Answer: Naive Bayes classifiers

15. High entropy means that the partitions in classification are

1. Pure

2. Not Pure

3. Usefull

4. useless

Answer: Uses a single processor or computer

16. Which of the following statements about Naive Bayes is incorrect?

1. Attributes are equally important.

2. Attributes are statistically dependent of one another given the class value.

3. Attributes are statistically independent of one another given the class value. Join:- https://t.me/AKTU_Notes_Books_Quantum

4. Attributes can be nominal or numeric

Answer: Attributes are statistically dependent of one another given the class value.

17. The maximum value for entropy depends on the number of classes so if we have 8 Classes what will be the max entropy.

1. Max Entropy is 1

2. Max Entropy is 2

3. Max Entropy is 3

4. Max Entropy is 4

Answer: Max Entropy is 3

18. Point out the wrong statement.

1. k-nearest neighbor is same as k-means

2. k-means clustering is a method of vector quantization

3. k-means clustering aims to partition n observations into k clusters

4. none of the mentioned

Answer: k-nearest neighbor is same as k-means

19. Consider the following example “How we can divide set of articles such that those articles have the same theme (we do not know the theme of the articles ahead of time) ” is this:

1. Clustering

2. Classification Join:- https://t.me/AKTU_Notes_Books_Quantum

3. Regression

4. None of these

Answer: Clustering

20. Can we use K Mean Clustering to identify the objects in video?

1. Yes

2. No

Answer: Yes

21. Clustering techniques are in the sense that the data scientist does not determine, in advance, the labels to apply to the clusters.

1. Unsupervised

2. supervised

3. Reinforcement

4, Neural network

Answer: Unsupervised

22. metric is examined to determine a reasonably optimal value of k.

1. Mean Square Error

2. Within Sum of Squares (WSS)

3. Speed Join:- https://t.me/AKTU_Notes_Books_Quantum

4. None of these

Answer: Within Sum of Squares (WSS)

23. If an itemset is considered frequent, then any subset of the frequent itemset must also be frequent.

1. Apriori Property

2. Downward Closure Property

3. Either 1 or 2

4. Both 1 and 2

Answer: Both 1 and 2Z

24. if {bread,eggs,milk} has a support of 0.15 and {bread,eggs} also has a support of 0.15, the confidence of rule {bread,eggs} = {milk} is

1.0

2.1

3.2

4.3

Answer: 1

25. Confidence is a measure of how X and Y are really related rather than coincidentally happeningtogether.

1. True Join:- https://t.me/AKTU_Notes_Books_Quantum

2. False

Answer: False

26. recommend items based on similarity measures between users and/or items.

1. Content Based Systems

2. Hybrid System

3. Collaborative Filtering Systems

4. None of these

Answer: Collaborative Filtering Systems

27. There are major Classification of Collaborative Filtering Mechanisms

1.1

2.2

3.3

4. none of above

Answer: 2

28. Movie Recommendation to people is an example of

1. User Based Recommendation

2. Item Based Recommendation

3. Knowledge Based Recommendation Join:- https://t.me/AKTU_Notes_Books_Quantum

4. content based recommendation

Answer: Item Based Recommendation

29. recommenders rely on an explicitely defined set of recommendation rules

1. Constraint Based

2. Case Based

3. Content Based

4. User Based

Answer: Case Based

30. Parallelized hybrid recommender systems operate dependently of one another and produce separate recommendation lists.

1. True

2. False

Answer: False

Department of IT

COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 1 DataAnalytics(KIT601)

1. The data with no pre-defined organizational form or specific format is

a. Semi-structured data b. Unstructured data c. Structured data d. None of these

Ans. b

a. Categorical data b. Interval data c. Ordinal data d. Ratio data

Ans. c

2. The data which can be ordered or ranked according to some relationship to one another is

Ans. b

Ans. a

Ans. c

Ans. d

6. Business domain expertise with deep understanding of the data, KPIs, key metrics and business intelligence from a reporting perspective is key role of ____________.

a. Business User b. Project Sponsor c. Business Intelligence Analyst d. Data Engineer

7. _____________ is concerned with uncertainty or inaccuracy of the data.

a. Volume b. Velocity c. Variety d. Veracity

Ans. d

Ans. True

11. The process of exploring data and reports in order to extract meaningful insights, which can be used to better understand and improve business performance.

a. Reporting b. Analysis c. Summarizing d. None of these

Ans. b

Ans. a

8. What are the V’s in the characteristics of Big data? a. Volume b. Velocity c. Variety d. All of these

9. What are the types of reporting in data analytics?

a. Canned reports b. Dashboard reports c. Alert reports d. All of above

10.Massive Parallel Processing (MPP) database breaks the data into independent chunks with independent disk and CPU resources.

a. True b. False

12. The key components of an analytical sandbox are: (i) Business analytics (ii) Analytical sandbox platform (iii) Data access and delivery (iv) Data sources

a. True b. False

Ans. b

a. Data preparation b. Discovery c. Data Modelling d. Data Building Ans. a

Ans.b

Ans. a

15. Which phase uses SQL, Python, R, or excel to perform various data modifications and transformations.

a. Data preparation b. Data cleaning c. Data Modelling d. Data Building

16. By definition, Database Administrator is a person who ___________

Ans. a

Ans. c

Ans. b

Ans .b

17. ETL stands for

a. Extract, Load, Transform b. Evaluate, Transform ,Load c. Extract , Loss , Transform d. None of the above

a. Data preparation b. Discovery c. Data Modelling d. Data Building

19. Which of the following is not a major data analysis approaches?

a. Data Mining b. Predictive Intelligence c. Business Intelligence d. Text Analytics

20. User rating given to a movie in a scale 1-10, can be considered as an attribute of type?

a. Nominal b. Ordinal c. Interval d. Ratio

Ans. d

22. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

a. TRUE b. FALSE c. Can be true or false d. Cannot say

Ans. a

Ans. b

Ans.b

25. The Process of describing the data that is huge and complex to store and process is known as

a. Analytics b. Data mining c. Big Data d. Data Warehouse

21. Data Analysis is defined by the statistician?

a. William S. b. Hans Peter Luhn c. Gregory Piatetsky-Shapiro d. John Tukey

23. Which of the following is not a major data analysis approaches?

a. Data Mining b. Predictive Intelligence c. Business Intelligence d. Text Analytics

24. Which of the following step is performed by data scientist after acquiring the data?

a. Data Cleansing b. Data Integration c. Data Replication d. All of the mentioned

Ans. c

26. Data generated from online transactions is one of the example for volume of big data. Is this true or False. a. TRUE b. FALSE

Ans. a

27. Velocity is the speed at which the data is processed

a. TRUE b. FALSE

Ans. b

28. _____________ have a structure but cannot be stored in a database.

a. Structured b. Semi-Structured c. Unstructured d. None of these

Ans. b

29. ____________refers to the ability to turn your data useful for business.

a. Velocity b. Variety c. Value d. Volume

Ans. c

30. Value tells the trustworthiness of data in terms of quality and accuracy.

a. TRUE b. FALSE

Ans.b

NPTEL Questions

31. Analysing the data to answer why some phenomenon related to learning happened is a type of

a. Descriptive Analytics b. Diagnostic Analytics

c. Predictive Analytics d. Prescriptive Analytics

Ans. B

32. Analysing the data to answer what will happen next is a type of

a. Descriptive Analytics b. Diagnostic Analytics c. Predictive Analytics d. Prescriptive Analytics

Ans. D

33. Learning analytics at institutions/University, regional or national level is termed as

a. Educational data mining b. Business intelligence c. Academic analytics d. None of the above

Ans. C

34. Which of the following questions is not a type of Predictive Analytics?

Ans A,D

35. A courses instructor has data about students attendance in her course in the past semester. Based on this data, she constructs a line graph type of analytics is she doing?

a. Descriptive Analytics b. Diagnostic Analytics c. Predictive Analytics d. Prescriptive Analytics

Ans. A

36. she then correlates the attendance with their final exam scores. She realizes that students who score 90% and above also have an attandence of more then 75%. What type of analytics is she doing?

a. Descriptive Analytics b. Diagnostic Analytics c. Predictive Analytics d. Prescriptive Analytics

Ans. B

38. Why one should not go for sampling?

a. Less costly to administer than a census. b. The person authorizing the study is comfortable with the sample. c. Because the research process is sometimes destructive d. None of the above

Ans. d

39. Stratified random sampling is a method of selecting a sample in which:

Ans. c

SET II

1. Data Analysis is defined by the statistician?

e. William S. f. Hans Peter Luhn g. Gregory Piatetsky-Shapiro h. John Tukey

Ans D

2. What is classification?

a) deciding what features to use in a pattern recognition problem b) deciding what class an input pattern belongs to c) deciding what type of neural network to use d) none of the mentioned

Ans. B

3. Data in ___________ bytes size is called Big Data.

A. Tera B. Giga C. Peta D. Meta

Ans : C

Explanation: data in Peta bytes i.e. 10^15 byte size is called Big Data. 4. How many V's of Big Data

A. 2 B. 3 C. 4 D. 5

Ans : D

Explanation: Big Data was defined by the “3Vs” but now there are “5Vs” of Big Data which are Volume, Velocity, Variety, Veracity, Value

5. Transaction data of the bank is?

A. structured data B. unstructured datat C. Both A and B D. None of the above

Ans : A

Explanation: Data which can be saved in tables are structured data like the transaction data of the bank. 6. In how many forms BigData could be found?

A. 2 B. 3 C. 4 D. 5

Ans : B

Explanation: BigData could be found in three forms: Structured, Unstructured and Semistructured. 7. Which of the following are Benefits of Big Data Processing?

A. Businesses can utilize outside intelligence while taking decisions B. Improved customer service C. Better operational efficiency D. All of the above

Ans : D

Explanation: All of the above are Benefits of Big Data Processing.

8. Which of the following are incorrect Big Data Technologies?

A. Apache Hadoop B. Apache Spark C. Apache Kafka D. Apache Pytarch

Ans : D

Explanation: Apache Pytarch is incorrect Big Data Technologies. 9. The overall percentage of the world’s total data has been created just within the past two years is ?

A. 80% B. 85% C. 90% D. 95%

Ans : C

Explanation: The overall percentage of the world’s total data has been created just within the past two years is 90%. 10. Apache Kafka is an open-source platform that was created by?

A. LinkedIn B. Facebook

C. Google D. IBM

Ans : A

Explanation: Apache Kafka is an open-source platform that was created by LinkedIn in the year 2011.

11. What was Hadoop named after?

A. Creator Doug Cutting’s favorite circus act B. Cuttings high school rock band C. The toy elephant of Cutting’s son D. A sound Cutting’s laptop made during Hadoop development

Ans : C

Explanation: Doug Cutting, Hadoop creator, named the framework after his child’s stuffed toy elephant. 12. What are the main components of Big Data?

A. MapReduce B. HDFS C. YARN D. All of the above

Ans : D

Explanation: All of the above are the main components of Big Data.

13. Point out the correct statement.

Ans : B

Explanation: Hadoop batch processes data distributed over a number of computers ranging in 100s and 1000s. 14. Which of the following fields come under the umbrella of Big Data?

A. Black Box Data B. Power Grid Data

C. Search Engine Data D. All of the above

Ans : D

Explanation: All options are the fields come under the umbrella of Big Data.

15. Which of the following is not an example of Social Media? 1. Twitter 2. Google 3. Instagram 4. Youtube

ANs: 2 (Google)

16. By 2025, the volume of digital data will increase to 1. TB 2. YB 3. ZB 4. EB Ans: 3 ZB

17. Data Analysis is a process of 1. inspecting data 2. cleaning data 3. transforming data 4. All of Above

Ans. 4 All of above

18. Which of the following is not a major data analysis approaches? 1. Data Mining 2. Predictive Intelligence 3. Business Intelligence 4. Text Analytics

Ans. 2 Predictive Intelligence

19. The Process of describing the data that is huge and complex to store and process is known as 1. Analytics 2. Data mining 3. Big data 4. Data warehouse

Ans. 3 Big data

20. In descriptive statistics, data from the entire population or a sample is summarized with ?

1. Integer descriptor 2. floating descriptor 3. numerical descriptor 4. decimal descriptor

Ans. 3 numerical descriptor

21. Data generated from online transactions is one of the example for volume of big data 1. TRUE 2. FALSE

TRUE

22. Velocity is the speed at which the data is processed 1. True 2. False

False

23. Value tells the trustworthiness of data in terms of quality and accuracy 1. TRUE 2. FALSE

False

24. Hortonworks was introduced by Cloudera and owned by Yahoo 1. True 2. False

False

25. ____ refers to the ability to turn your data useful for business 1. Velocity 2. variety 3. Value 4. Volume

Ans. 3 Value

26. Data Analysis is defined by the statistician? 1. William S. 2. Hans Peter Luhn 3. Gregory Piatetsky-Shapiro 4. John Tukey

Ans. 4 John Tukey

27. Files are divided into ____ sized Chunks. 1. Static 2. Dynamic 3. Fixed 4. Variable

Ans. 3 Fixed

28. _____ is an open source framework for storing data and running application on clusters of commodity hardware. 1. HDFS 2. Hadoop 3. MapReduce 4. Cloud

Ans. 2 Hadoop

29. ____ is factors considered before Adopting Big Data Technology 1. Validation 2. Verification 3. Data 4. Design

Ans. 1 Validation

30. Which among the following is not a Data mining and analytical applications? 1. profile matching 2. social network analysis 3. facial recognition 4. Filtering

Ans. 4 Filtering

31. Which storage subsystem can support massive data volumes of increasing size. 1. Extensibility 2. Fault tolerance 3. Scalability 4. High-speed I/O capacity

Ans. 3 Scalability

32. ______ is a programming model for writing applications that can process Big Data in parallel on multiple nodes.

1. HDFS 2. MAP REDUCE 3. HADOOP 4. HIVE Ans. MAP REDUCE

33. How many main statistical methodologies are used in data analysis?

A. 2 B. 3 C. 4 D. 5

Ans : A

Explanation: In data analysis, two main statistical methodologies are used Descriptive statistics and Inferential statistics.

34. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

A. TRUE B. FALSE C. Can be true or false D. Can not say

Ans : A

Explanation: The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

35. The branch of statistics which deals with development of particular statistical methods is classified as 1. industry statistics 2. economic statistics 3. applied statistics 4. applied statistics

Ans. applied statistics

36. Point out the correct statement. a) Descriptive analysis is first kind of data analysis performed b) Descriptions can be generalized without statistical modelling

c) Description and Interpretation are same in descriptive analysis d) None of the mentioned

Answer: b Explanation: Descriptive analysis describe a set of data.

37. What are the five V’s of Big Data?

A. Volume

B. Velocity

C. Variety

D. All the above

Answer: Option D

38. What are the main components of Big Data?

A. MapReduce

B. HDFS

C. YARN

D. All of these

Answer: Option D

39. What are the different features of Big Data Analytics?

A. Open-Source

B. Scalability

C. Data Recovery

D. All the above

Answer: Option D

40. Which of the following refers to the problem of finding abstracted patterns (or structures) in the unlabeled data?

A. Supervised learning

B. Unsupervised learning

C. Hybrid learning

D. Reinforcement learning

Answer: B

Explanation: Unsupervised learning is a type of machine learning algorithm that is generally used to find the hidden structured and patterns in the given unlabeled data.

41. Which one of the following refers to querying the unstructured textual data?

A. Information access

B. Information update

C. Information retrieval

D. Information manipulation

Answer: D

42. For what purpose, the analysis tools pre-compute the summaries of the huge amount of data?

A. In order to maintain consistency

B. For authentication

C. For data access

D. To obtain the queries response

Answer: d

43. Which one of the following statements is not correct about the data cleaning?

It refers to the process of data cleaning

It refers to the transformation of wrong data into correct data

It refers to correcting inconsistent data

All of the above

Answer: d

44. Any data with unknown form or the structure is classified as _ data. a. Structured b. Unstructured c. Semi-structured d. None of above Ans. b

45.____ means relating to the issuing of reports. a. Analysis b. Reporting c. Reporting and Analysis d. None of the above

Ans. b

46.Veracity involves the reliability of the data this is ________due to the numerous data sources of big data a) Easy and difficulty b) Easiness c) Demanding d) none of these

Ans. d

48. _____data is data whose elements are addressable for effective analysis.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. a

49. ______data is information that does not reside in a relational database but that have some organizational properties that make it easier to analyze.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. b

50. ______data is a data which is not organized in a predefined manner or does not have a predefined data model, thus it is not a good fit for a mainstream relational database.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. c

51. There are ___ types of big data.

a. 2 b. 3 c. 4 d. 5

Ans. b

52. Google search is an example of _________ data.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. c

Department of IT

COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 2 DataAnalytics(KIT601)

1. Maximum aposteriori classifier is also known as: a. Decision tree classifier b. Bayes classifier c. Gaussian classifier d. Maximum margin classifier

Ans. B

2. Which of the following sentence is FALSE regarding regression?

a. It relates inputs to outputs. b. It is used for prediction. c. It may be used for interpretation. d. It discovers causal relationships.

Ans. d

3. Suppose you are working on stock market prediction, and you would like to predict the price of a particular stock tomorrow (measured in dollars).

You want to use a learning algorithm for this.

a. Regression b. Classification c. Clustering d. None of these

Ans. a

4. In binary logistic regression:

a. The dependent variable is divided into two equal subcategories. b. The dependent variable consists of two categories. c. There is no dependent variable. d. The dependent variable is continuous.

Ans. b

5. A fair six-sided die is rolled twice. What is the probability of getting 4 on the first roll and not getting 6 on the second roll?

a. 1/36 b. 5/36 c. 1/12 d. 1/9

Ans. b

6. The parameter β0 is termed as intercept term and the parameter β1 is termed as slope parameter. These parameters are usually called as _________

a Regressionists b. Coefficients c. Regressive d. Regression coefficients

Ans. d

7. ________ is a simple approach to supervised learning. It assumes that the dependence of Y on X1, X2… Xp is linear.

a. Gradient Descent b. Linear regression

c. Logistic regression d. Greedy algorithms

Ans. c

8. What makes the interpretation of conditional effects extra challenging in logistic regression?

Ans. c 9. If there were a perfect positive correlation between two interval/ratio variables, the Pearson's r test would give a correlation coefficient of:

a. - 0.328 b. +1 c. +0.328 d. – 1

Ans.b

10. Logistic Regression transforms the output probability to in a range of [0, 1]. Which of the following function is used for this purpose?

a. Sigmoid b. Mode c. Square d. All of these

Ans.a

12. Generally which of the following method(s) is used for predicting continuous dependent variable?

1. Linear Regression 2. Logistic Regression

a. 1 and 2

b. only 1 c. only 2 d. None of these

Ans.b

13. Mean of the set of numbers {1, 2, 3, 4, 5} is?

a. 2 b. 3 c. 4 d. 5

Ans.b

14. Name of a movie, can be considered as an attribute of type?

a. Nominal

b. Ordinal

c. Interval

d. Ratio

Ans.a

15. Let A be an example, and C be a class. The probability P(C) is known as:

a. Apriori probability

b. Aposteriori probability

c. Class conditional probability

d. None of the above

Ans.a

16. Consider two binary attributes X and Y. We know that the attributes are independent and probability P(X=1) = 0.6, and P(Y=0) = 0.4. What is the probability that both X and Y have values 1?

a. 0. 0.06 b. 0.16 c. 0.26 d. 0.36

Ans. d

17. In regression the output is a. Discrete b. Continuous c. Continuous and always lie in same range d. May be discrete and continuous

Ans. b

18. The probabilistic model that finds the most probable prediction using the training data and space of hypotheses to make a prediction for a new data instance.

a. Concept learning b. Bayes optimal classifier c. EM algorithm d. Logistic regression

Ans. b

19 . State whether the following condition is true or not. “In Bayesian theorem , it is important to find the probability of both the events occurring simultaneously”

a. True b. False

Ans. b 20 .If the correlation coefficient is a positive value, then the slope of the regression line

a. can be either negative or positive

b. must also be positive c. can be zero d. cannot be zero

Ans. b

21. Which of the following is true about Naive Bayes?

Ans. c

22. Previous probabilities in Bayes Theorem that are changed with help of new available information are classified as _______

a. independent probabilities b. posterior probabilities c. interior probabilities d. dependent probabilities

Ans. b

23. Which of the following methods do we use, to find the best fit line for data in Linear Regression?

a. Least Square Error b. Maximum Likelihood c. Logarithmic Loss d. Both A and B

Ans. a

24. What is the consequence between a node and its predecessors while creating Bayesian network?

a. Conditionally dependent b. Dependent c. Conditionally independent d. Both a & b

Ans. c 25. Bayes rule can be used to __________conditioned on one piece of evidence.

a. Solve queries b. Answer probabilistic queries c. Decrease complexity of queries d. Increase complexity of queries

Ans.b

26. Which of the following options is/are correct in reference to Bayesian Learning?

Ans. d

27. When the cell is said to be fired? a. if potential of body reaches a steady threshold values b. if there is impulse reaction c. during upbeat of heart d. none of the mentioned

Ans.a 28. Which of the following is true about regression analysis?

a. answering yes/no questions about the data b. estimating numerical characteristics of the data c. modeling relationships within the data d. describing associations within the data

Ans.c

a. We can still classify data correctly for given setting of hyper parameter C b. We cannot classify data correctly for given setting of hyper parameter C. c. Can’t Say

d. None of these

Ans. a

30. What is/are true about kernel in SVM?

(a) Kernel function map low dimensional data to high dimensional space (b) It’s a similarity function

Ans. c

32. Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?

Ans.b

33. Which of the following can only be used when training data are linearly separable?

a. Linear Logistic Regression. b. Linear Soft margin SVM c. Linear hard-margin SVM d. Parzen windows.

Ans.c

34. Using the kernel trick, one can get non-linear decision boundaries using algorithms designed originally for linear models.

a. True b. False

Ans. a

35. Support vectors are the data points that lie closest to the decision surface.

a. True b. False

Ans. True

36. Which of the following statement is true for a multilayered perceptron?

Ans. a

37. Which of the following is/are true regarding an SVM?

Ans. a

38. The function of distance that is used to determine the weight of each training example in instance based learning is known as______________

a. Kernel Function b. Linear Function c. Binomial distribution d. All of the above

a. Step function b. Heaviside function c. Logistic function d. Binary function

Ans. b

a. All of the mentioned are true b. (ii) and (iii) are true c. (i) and (ii) are true d. Only (i) is true

Ans. a

41. Which of the following is an application of NN (Neural Network)?

a. Sales forecasting b. Data validation c. Risk management d. All of the mentioned

Ans. d

42. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it outputs a 1, otherwise it just outputs a 0.

a. True b. False

Ans. a

43. In what ways can output be determined from activation value in ANN?

a. Deterministically

b. Stochastically c. both deterministically & stochastically d. none of the mentioned

Ans. c

45. In ANN, the amount of output of one unit received by another unit depends on what?

a. output unit b. input unit c. activation value d. weight

Ans. d

46. Function of dendrites in ANN is

a. receptors b. transmitter c. both receptor & transmitter d. none of the mentioned

Ans. a

a. All of the mentioned are true b. (ii) and (iii) are true c. (i), (ii) and (iii) are true d. Only (i) is true

a. Step function b. Heaviside function

c. Logistic function d. Binary function

Ans. b

49. 4 input neuron has weight 1, 2, 3 and 4. The transfer function is linear with the constant of proportionality being equal to 2. The inputs are 4,10,5 and 20 respectively. The output will be

a. 238 b. 76 c. 119 d. 123

Ans. a

50. Which of the following are real world applications of the SVM?

a. Text and Hypertext Categorization b. Image Classification c. Clustering of News Articles d. All of the above

Ans.d

51. Support vector machine may be termed as:

a. Maximum apriori classifier

b. Maximum margin classifier

c. Minimum apriori classifier

d. Minimum margin classifier

Ans.b

52. What is purpose of Axon? a. receptors b. transmitter c. transmission d. none of the mentioned

53. The model developed from sample data having the form of ŷ = b0 + b1X is known as: Ans: - C – estimated regression equation

54. In regression analysis, which of the following is not a required assumption about the error term ε?

Ans: - A – The expected value of the error term is one

55. ____________ are algorithms that learn from their more complex environments (hence eco) to generalize, approximate and simplify solution logic.

a. Fuzzy Relational DB

b. Ecorithms

c. Fuzzy Set

d. None of the mentioned

Ans. c

56. The truth values of traditional set theory is ____________ and that of fuzzy set is __________

a. Either 0 or 1, between 0 & 1

b. Between 0 & 1, either 0 or 1

c. Between 0 & 1, between 0 & 1

d. Either 0 or 1, either 0 or 1

Ans. a

57. What is the form of Fuzzy logic?

a. Two-valued logic

b. Crisp set logic

Ans.c

c. Many-valued logic

d. Binary set logic

Ans. c

58. Fuzzy logic is usually represented as ___________

a. IF-THEN rules

b. IF-THEN-ELSE rules

c. Both IF-THEN-ELSE rules & IF-THEN rules

d. None of the mentioned

Ans. a

59. ______________ is/are the way/s to represent uncertainty.

a. Fuzzy Logic

b. Probability

c. Entropy

d. All of the mentioned

Ans.d

60. Fuzzy Set theory defines fuzzy operators. Choose the fuzzy operators from the following.

a. AND

b. OR

c. NOT

d. All of mentioned

Ans. d

61. The values of the set membership is represented by ___________

a. Discrete Set

b. Degree of truth

c. Probabilities

d. Both Degree of truth & Probabilities View Answer

Ans. b

62. Fuzzy logic is extension of Crisp set with an extension of handling the concept of Partial Truth.

a. True

b. False

Ans. a

SET II

1. Sentiment Analysis is an example of 1. Regression 2. Classification 3. clustering 4. Reinforcement Learning

1. 1, 2 and 4 2. 1, 2 and 3 3. 1 and 3 4. 1 and 2 Show Answer Ans. 1, 2 and 4

2. The self-organizing maps can also be considered as the instance of _________ type of learning.

A. Supervised learning B. Unsupervised learning C. Missing data imputation D. Both A & C

Answer: B Explanation: The Self Organizing Map (SOM), or the Self Organizing Feature Map is a kind of Artificial Neural Network which is trained through unsupervised learning.

3. The following given statement can be considered as the examples of_________

Suppose one wants to predict the number of newborns according to the size of storks' population by performing supervised learning

A. Structural equation modeling B. Clustering C. Regression D. Classification

Answer: C

Explanation: The above-given statement can be considered as an example of regression. Therefore the correct answer is C.

4. In the example predicting the number of newborns, the final number of total newborns can be considered as the _________

A. Features B. Observation C. Attribute

D. Outcome

5. Which of the following statement is true about the classification?

A. It is a measure of accuracy B. It is a subdivision of a set C. It is the task of assigning a classification D. None of the above

Answer: B

6. Which one of the following correctly refers to the task of the classification?

Answer: B

Explanation: The task of classification refers to dividing the set into subsets or in the numbers of the classes. Therefore the correct answer is C.

8. _______is a type of regression which models the non-linear dataset using a linear model.

a. Polynomial Regression b. Logistic Regression c. Linear Regression d. Decision Tree Regression

Ans. a

9. The prediction of the weight of a person when his height is known, is a simple example of regression. The function used in R language is_____.

a. Im() b. print() c. predict() d. summary( )

Ans. c

10. There is the following syntax of lm() function in multiple regression.

lm(y ~ x1+x2+x3...., data) a. y is predictor and x1,x2,x3 are the dependent variables. b. y is dependent and x1,x2,x3 are the predictors. c. data is predictor variable. d. None of the above.

Ans. b

11. _______is a probabilistic graphical model which represents a set of variables and their conditional dependencies using a directed acyclic graph.

a. A Bayesian network b. Bayes Network c. Bayesian Model d. All of the above

Ans. d

12. In support vector regression, _____is a function used to map lower dimensional data into higher dimensional data

A) Boundary line B) Kernel C) Hyper Plane D) Support Vector Ans. B

Ans. b

14. The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a ____ or_____.

a. Directed Acyclic Graph or DAG b. Directed Cyclic Graph or DCG. c. Both the above. d. None of the above.

Ans. a

15. The hyperplane with maximum margin is called the ______ hyperplane. a. Non-optimal b. Optimal c. None of the above d. Requires one more option

Ans. b

16. One more _____ is needed for non-linear SVM.

a. Dimension b. Attribute c. Both the above d. None of the above

Ans. a

17. A subset of dataset to train the machine learning model, and we already know the output.

a. Training set b. Test set c. Both the above

d. None of the above

Ans. a

a. Feature Sampling b. Feature Scaling c. None of the above d. Both the above

Ans. b

Ans. a

20. _____ units which are internal to the network and do not directly interact with the environment. a. Input b. Output c. Hidden d. None of the above

Ans. c

Ans. b

25. Identify the component of a time series a. Temporal b. Shares c. Trend d. Policymakers

Ans. c

Ans. b

28. The _______ perceptron consists of a set of input units connected by a single layer of weights to a set of output units a. Multi layer b. Single layer c. Hidden layer d. None of these

Ans. b

30. Patterns that repeat over a certain period of time a. Seasonal b. Trend c. None of the above d. Both of the above

Ans. a

31. Which of the following is characteristic of best machine learning method ?

a. Fast b. Accuracy c. Scalable d. All of the Mentioned

Ans. d

33. Supervised learning and unsupervised clustering both require at least one a. hidden attribute. b. output attribute. c. input attribute. d. categorical attribute. Ans. c

37. Machine learning techniques differ from statistical techniques in that machine learning methods a. typically assume an underlying distribution for the data.

b. are better able to deal with missing and noisy data. c. are not able to explain their behavior. d. have trouble with large-sized datasets. Ans. b

38. This supervised learning technique can process both numeric and categorical input attributes. a. linear regression b. Bayes classifier c. logistic regression d. backpropagation learning Ans. b

39. This technique associates a conditional probability value with each data instance. a. linear regression b. logistic regression c. simple regression d. multiple linear regression Ans. b

40. Logistic regression is a ________ regression technique that is used to model data having a _____outcome. a. linear, numeric b. linear, binary c. nonlinear, numeric d. nonlinear, binary Ans. d

Ans. d

42. Which of the following is true about Naive Bayes? a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent

Ans: a

46. Automated vehicle is an example of ______ a. Supervised learning b. Unsupervised learning c. Active learning d. Reinforcement learning

Ans: a

48. Neural networks a. optimize a convex cost function b. cannot be used for regression as well as classification c. always output values between 0 and 1 d. can be used in an ensemble

Ans: d

Ans: c

50. Which of the following is a disadvantage of decision trees?

a. Factor analysis b. Decision trees are robust to outliers c. Decision trees are prone to be overfit d. None of the above

Ans: c

Ans: b

52. Identify the following activation function : φ(V) = Z + (1/ 1 + exp (– x * V + Y) ),Z, X, Y are parameters

a. Step function b. Ramp function c. Sigmoid function

d. Gaussian function

Ans: c

Ans: d

54. With Bayes classifier, missing data items are a. treated as equal compares. b. treated as unequal compares. c. replaced with a default value. d. ignored.

Ans:b

Ans: b

56. Which of the following is true about Naive Bayes?

a. Assumes that all the features in a dataset are equally important b. Assumes that all the features in a dataset are independent c. Both a and b d. None of the above options

Ans: c

57. How many terms are required for building a Bayes model?

a. 1 b. 2 c. 3 d. 4

Ans: c

58. What does the Bayesian network provides? a. Complete description of the domain b. Partial description of the domain c. Complete description of the problem d. None of the mentioned

Ans: a

59. How the Bayesian network can be used to answer any query? a. Full distribution b. Joint distribution c. Partial distribution d. All of the mentioned

Ans: b

60. In which of the following learning the teacher returns reward and punishment to learner? a. Active learning b. Reinforcement learning c. Supervised learning d. Unsupervised learning

Ans: b

61. Which of the following is the model used for learning? a. Decision trees b. Neural networks c. Propositional and FOL rules d. All of the mentioned

Ans: d

Department of IT

COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 3 DataAnalytics(KIT601)

Q.1 Which attribute is _not_ indicative for data streaming?

A) Limited amount of memory

B) Limited amount of processing time

C) Limited amount of input data

D) Limited amount of processing power

Q.2 Which of the following statements about data streaming is true?

A) Stream data is always unstructured data.

B) Stream data often has a high velocity.

C) Stream elements cannot be stored on disk.

D) Stream data is always structured data.

Ans. B

Q.3 What is the main difference between standard reservoir sampling and min-wise sampling?

A) Reservoir sampling makes use of randomly generated numbers whereas minwise sampling does not.

B) Min-wise sampling makes use of randomly generated numbers whereas reservoir sampling does not.

C) Reservoir sampling requires a stream to be processed sequentially, whereas minwise does not.

D) For larger streams, reservoir sampling creates more accurate samples than minwise sampling.

Ans. C)

Q.4 A Bloom filter guarantees no

A) false positives

B) false negatives

C) false positives and false negatives

D) false positives or false negatives, depending on the Bloom filter type

Ans. B)

Q,5 Which of the following statements about standard Bloom filters is correct?

A) It is possible to delete an element from a Bloom filter.

B) A Bloom filter always returns the correct result.

C) It is possible to alter the hash functions of a full Bloom filter to create more

space.

Incorrect.

D) A Bloom filter always returns TRUE when testing for a previously added

element.

Ans. D)

A) The number of 0's cannot be estimated at all.

B) The number of 0's can be estimated with a maximum guaranteed error.

C) To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's.

D) None of these

Ans. B)

Q.7 Which of the following statements about the standard DGIM algorithm are false?

A)DGIM operates on a time-based window.

B) DGIM reduces memory consumption through a clever way of storing counts.

C) In DGIM, the size of a bucket is always a power of two.

D) The maximum number of buckets has to be chosen beforehand.

Ans. D)

Q.8 Which of the following statements about the standard DGIM algorithm are false?

A)DGIM operates on a time-based window.

B) DGIM reduces memory consumption through a clever way of storing counts.

C) In DGIM, the size of a bucket is always a power of two.

D) The buckets contain the count of 1's and each 1's specific position in the stream

Ans. D)

Q.9 What are DGIM’s maximum error boundaries? A) DGIM always underestimates the true count; at most by 25%

B) DGIM either underestimates or overestimates the true count; at most by 50%

C) DGIM always overestimates the count; at most by 50%

D) DGIM either underestimates or overestimates the true count; at most by 25%

Ans. B)

Q.10 Which algorithm should be used to approximate the number of distinct elements in a data stream?

A) Misra-Gries

B) Alon-Matias-Szegedy

C) DGIM

D) None of the above

Ans. D)

Q.11 Which algorithm should be used to approximate the number of distinct elements in a data stream?

A) Misra-Gries

B) Alon-Matias-Szegedy

C) DGIM

D) Flajolet and Martin

Ans. D)

Q.12 Which of the following streaming windows show valid bucket representations according to the DGIM rules?

A) 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 1

B) 1 0 1 1 1 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 1

C) 1 1 1 1 0 0 1 1 1 0 1 0 1

D) 1 0 1 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1

Ans. D)

Q.13 For which of the following streams is the second-order moment F2 greater than 45?

A) 10 5 5 10 10 10 1 1 1 10

B) 10 10 10 10 10 5 5 5 5 5

C) 1 1 1 1 1 5 10 10 5 1

D) None of these

Ans. B)

Q.14 For which of the following streams is the second-order moment F2 greater than 45?

A) 10 5 5 10 10 10 1 1 1 10

B) 10 10 10 10 10 10 10 10 10 10

C) 1 1 1 1 1 5 10 10 5 1

D) None of these

Ans. B)

Q 15 : In Bloom filter an array of n bits is initialized with

A) all 0s

B) all 1s

C) half 0s and half 1s

D) all -1

Ans. A)

Q 16. Pick a hash function h that maps each of the N elements to at least log2 N bits, Estimated number of distinct elements is

A) 2^R

B) 2^(-R)

C) 1-(2^R)

D) 1-(2^(-R))

Ans. A)

Q.17 Sliding window operations typically fall in the category

A) OLTP Transactions

B) Big Data Batch Processing

C) Big Data Real Time Processing

D) Small Batch Processing

Ans. C)

Q.18 What is the finally produced by Hierarchical Agglomerative Clustering?

A) final estimate of cluster centroids

B)assignment of each point to clusters

C) tree showing how close things are to each other

D) Group of clusters

Ans. C)

Q19 Which of the algorithm can be used for counting 1's in a stream

A) FM Algorithm

B) PCY Algorithm

C) DGIM Algorithm

D) SON Algorithm

Ans. C)

Q20 Which technique is used to filter unnecessary itemset in PCY algorithm

A) Association Rule

B) Hashing Technique

C) Data Mining

D) Market basket

Ans. B)

Q21 In association rule, which of the following indicates the measure of how frequently the items occur in a dataset ?

A) Support B) Confidence C) Basket D) Itemset

Ans. A)

Q.22 which of the following clustering technique is used by K- Means Algorithm

A) Hierarchical Technique

B) Partitional technique

C)Divisive

D) Agglomerative

Ans. B)

Q.23 which of the following clustering technique is used by Agglomerative Nesting Algorithm

A) Hierarchical Technique

B) Partitional technique

C) Density based

D)None of these

Q24. Which of the following Hierarchichal approach begins with each observation in a distinct (singleton) cluster, and successively merges clusters together until a stopping criterion is satisfied.

A) Divisive

B) Agglomerative

C) Single Link

D) Complete Link

Q.25 Park, Chen, Yu algorithm is useful for __________in Big Data Application.

A) Find Frequent Itemset

B) Filtering Stream

C) Distinct Element Find

D) None of these

Ans. A)

Q.26 .Match the following

a) Bloom filter i) Frequent Pattern Mining

b) FM Algorithm ii) Filtering Stream

c) PCY Algorithm iii) Distinct Element Find d) DGIM Algorithm iv) Counting 1’s in window A a)-ii), b-iii), c-i), d-iv)

B a)-iii), b-ii), c-i), d-iv)

C) A a)-i1), b-iii), c-ii), d-iv)

D) None of these

Ans. A)

SET II

Answer: a

2. Which of the following is an essential process in which the intelligent methods are applied to extract data patterns? a. Warehousing b. Data Mining c. Text Mining d. Data Selection

Answer: b

Explanation: Data mining is a type of process in which several intelligent methods are used to extract meaningful data from the huge collection (or set) of data.

3. What are the functions of Data Mining? a. Association and correctional analysis classification b. Prediction and characterization

c. Cluster analysis and Evolution analysis d. All of the above

Answer: d

4. Which attribute is _not_ indicative for data streaming?

a. Limited amount of memory b. Limited amount of processing time c. Limited amount of input data d. Limited amount of processing power

Ans. c

5. Which of the following statements about data streaming is true?

a. Stream data is always unstructured data. b. Stream data often has a high velocity. c. Stream elements cannot be stored on disk. d. Stream data is always structured data.

Ans. b

Ans. a

7. Which of the following statements about sampling are correct? a. Sampling reduces the diversity of the data stream

Ans. d

8. What is the main difference between standard reservoir sampling and min-wise sampling?

Ans. c

9. A Bloom filter guarantees no

a. false positives b. false negatives c. false positives and false negatives d. false positives or false negatives, depending on the Bloom filter type

Ans. b

10. Which of the following statements about standard Bloom filters is correct?

Ans. d

11. The FM-sketch algorithm uses the number of zeros the binary hash value ends in to make an estimation. Which of the following statements is true about the hash tail?

a. Any specific bit pattern is equally suitable to be used as hash tail.

Ans. a

12. The FM-sketch algorithm can be used to:

Ans. a

Ans. b

d. In DGIM, the size of a bucket is always a power of two Ans. b

16. What are DGIM’s maximum error boundaries?

Ans. b

17. Which algorithm should be used to approximate the number of distinct elements in a data stream?

a. Misra-Gries b. Alon-Matias-Szegedy c. DGIM d. None of the above

Ans. d

18. Which of the following statements about Bloom filters are correct?

Ans. d

19. Which of the following statements about Bloom filters are correct?

d. A Bloom filter always returns FALSE when testing for an element that was not previously added Ans. a

20. Which of the following streaming windows show valid bucket representations according to the DGIM rules?

a. 1 0 1 1 1 0 1 0 1 1 1 1 0 1 0 1 b. 1 0 1 1 1 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 1 c. 1 1 1 1 0 0 1 1 1 0 1 0 1 d. 1 0 1 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1

Ans. d

a. O(k(logm+logn))

Correct!

b. o(k(logm+logn)) c. O(logk(m+n)) d. o(logk(m+n))

19) Which of the following statements is correct about data mining?

Answer: d

25) The classification of the data mining system involves:

a. Database technology b. Information Science c. Machine learning d. All of the above

Answer: d

27) The issues like efficiency, scalability of data mining algorithms comes under_______

a. Performance issues b. Diverse data type issues c. Mining methodology and user interaction d. All of the above

Answer: a

Explanation: In order to extract information effectively from a huge collection of data in databases, the data mining algorithm must be efficient and scalable. Therefore the correct answer is A.

Department of IT

COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester UNIT 4 DataAnalytics(KIT601)

Ans. a

2. What techniques can be used to improve the efficiency of apriori algorithm? a. hash based techniques b. transaction reduction c. Partitioning d. All of these

Ans.d 3. What do you mean by support (A)?

Ans. c 4. Which of the following is direct application of frequent itemset mining? a. Social Network Analysis b. Market Basket Analysis c. outlier detection

d. intrusion detection

Ans. c

Ans. a

7. What is the relation between candidate and frequent itemsets?

a. A candidate itemset is always a frequent itemset b. A frequent itemset must be a candidate itemset c. No relation between the two d. None of these

Ans. b

8. What is the principle on which Apriori algorithm work?

a. If a rule is infrequent, its specialized rules are also infrequent b. If a rule is infrequent, its generalized rules are also infrequent c. Both a and b d. None of these

Ans. a

9. Which of these is not a frequent pattern mining algorithm a. Apriori b. FP growth c. Decision trees d. Eclat

Ans. c

10. What are closed frequent itemsets?

a. A closed itemset b. A frequent itemset c. An itemset which is both closed and frequent d. None of these

Ans. c

11. What are maximal frequent itemsets? a. A frequent item set whose no super-itemset is frequent b. A frequent itemset whose super-itemset is also frequent c. Both a and b d. None of these

Ans. a

12. What is association rule mining?

a. Same as frequent itemset mining b. Finding of strong association rules using frequent itemsets c. Both a and b d. None of these

Ans. b

13. What is frequent pattern growth?

a. Same as frequent itemset mining b. Use of hashing to make discovery of frequent itemsets more efficient c. Mining of frequent itemsets without candidate generation d. None of these

Ans. c

14. When is sub-itemset pruning done?

a. A frequent itemset ‘P’ is a proper subset of another frequent itemset ‘Q’ b. Support (P) = Support(Q) c. When both a and b is true d. When a is true and b is not

Ans. c

15. Our use of association analysis will yield the same frequent itemsets and strong association rules whether a specific item occurs once or three times in an individual transaction

a. TRUE b. FALSE c. Both a and b d. None of these

Ans. a

16. The number of iterations in apriori __

Ans. c

18. Significant Bottleneck in the Apriori algorithm is a. Finding frequent itemsets b. pruning c. Candidate generation d. Number of iterations

Ans. c

19. Which Association Rule would you prefer a. High support and medium confidence b. High support and low confidence c. Low support and high confidence d. Low support and low confidence

Ans. c

22. A collection of one or more items is called as _____

( a ) Itemset ( b ) Support ( c ) Confidence ( d ) Support Count Ans. a

23. Frequency of occurrence of an itemset is called as _____

(a) Support (b) Confidence (c) Support Count (d) Rules Ans. c

24. An itemset whose support is greater than or equal to a minimum support threshold is ______

(a) Itemset (b) Frequent Itemset (c) Infrequent items (d) Threshold values

Ans. b

25. The goal of clustering is to- a. Divide the data points into groups b. Classify the data point into different classes c. Predict the output values of input data points d. All of the above

Ans. a

28. Which version of the clustering algorithm is most sensitive to outliers? a. K-means clustering algorithm b. K-modes clustering algorithm c. K-medians clustering algorithm d. None

Ans. a 29. Which of the following is a bad characteristic of a dataset for clustering analysis-

a. Data points with outliers b. Data points with different densities c. Data points with non-convex shapes d. All of the above Ans. d

30. For clustering, we do not require- a. Labeled data b. Unlabeled data c. Numerical data d. Categorical data

32. Which of the step is not required for K-means clustering?

a. a distance metric b. initial number of clusters c. initial guess as to cluster centroids d. None Ans. d

Ans. a 37. What is a dendrogram?

a. A hierarchical structure b. A diagram structure c. A graph structure d. None

Ans. a

39. Which one of the following statements about the K-means clustering is incorrect?

Ans. c

40. The self-organizing maps can also be considered as the instance of _________ type of learning.

a. Supervised learning b. Unsupervised learning c. Missing data imputation d. Both A & C

Ans. b

41. Euclidean distance measure is can also defined as ___________

a. The process of finding a solution for a problem simply by enumerating all possible solutions according to some predefined order and then testing them

b. The distance between two points as calculated using the Pythagoras theorem c. A stage of the KDD process in which new data is added to the existing selection. d. All of the above

Ans. c

42. Which of the following refers to the sequence of pattern that occurs frequently?

a. Frequent sub-sequence b. Frequent sub-structure c. Frequent sub-items d. All of the above

Ans. a 43. Which method of analysis does not classify variables as dependent or independent? a) Regression analysis b) Discriminant analysis c) Analysis of variance d) Cluster analysis Answer: (d)

Department of IT

COURSE B.Tech., VI SEM, MCQ Assignment (2020-21) Even Semester DataAnalytics(KIT601)

1. The Process of describing the data that is huge and complex to store and process is known as

a. Analytics b. Data mining c. Big Data d. Data Warehouse

Ans C

2. Data generated from online transactions is one of the example for volume of big data. Is this true or False. a. TRUE b. FALSE

Ans. a 3. Velocity is the speed at which the data is processed

a. TRUE b. FALSE

Ans. b

4. _____________ have a structure but cannot be stored in a database.

a. Structured b. Semi-Structured c. Unstructured d. None of these

Ans. b 5. ____________refers to the ability to turn your data useful for business.

a. Velocity b. Variety c. Value d. Volume

Ans. C

6. Value tells the trustworthiness of data in terms of quality and accuracy.

a. TRUE b. FALSE

Ans. b 7. GFS consists of a ____________ Master and ___________ Chunk Servers a. Single, Single b. Multiple, Single c. Single, Multiple

d. Multiple, Multiple

Ans. c

8. Files are divided into ____________ sized Chunks. a. Static b. Dynamic c. Fixed d. Variable Ans. c

9. ____________is an open source framework for storing data and running application on clusters of commodity hardware. a. HDFS b. Hadoop c. MapReduce d. Cloud Ans. B

10. HDFS Stores how much data in each clusters that can be scaled at any time? a. 32 b. 64 c. 128 d. 256 Ans. c

12. Hortonworks was introduced by Cloudera and owned by Yahoo. a. TRUE b. FALSE Ans. b

13. Hadoop YARN is used for Cluster Resource Management in Hadoop Ecosystem. a. TRUE b. FALSE Ans. a

14. Google Introduced MapReduce Programming model in 2004. a. TRUE b. FALSE Ans. A

15.______________ phase sorts the data & ____________creates logical clusters. a. Reduce, YARN b. MAP, YARN c. REDUCE, MAP d. MAP, REDUCE Ans. d

16. There is only one operation between Mapping and Reducing is it True or False...

a. TRUE b. FALSE

Ans. A

17. __________ is factors considered before Adopting Big Data Technology. a. Validation b. Verification c. Data d. Design Ans. a

18. _________ for improving supply chain management to optimize stock management, replenishment, and forecasting; a. Descriptive b. Diagnostic c. Predictive d. Prescriptive Ans. c

19. which among the following is not a Data mining and analytical applications? a. profile matching b. social network analysis c. facial recognition d. Filtering Ans. d

22. Which storage subsystem can support massive data volumes of increasing size. a. Extensibility b. Fault tolerance c. Scalability d. High-speed I/O capacity Ans. c

23. ______________provides performance through distribution of data and fault tolerance through replication a. HDFS b. PIG c. HIVE d. HADOOP

Ans. a

24. ______________ is a programming model for writing applications that can process Big Data in parallel on multiple nodes. a. HDFS b. MAP REDUCE c. HADOOP d. HIVE Ans. b

25. _____________________ takes the grouped key-value paired data as input and runs a Reducer function on each one of them. a. MAPPER b. REDUCER c. COMBINER d. PARTITIONER Ans. b

26. _______________ is a type of local Reducer that groups similar data from the map phase into identifiable sets. a. MAPPER b. REDUCER c. COMBINER d. PARTITIONER. Ans. c

27. MongoDB is __________________ a. Column Based b. Key Value Based c. Document Based d. Graph Based Ans. c

28. ____________ is the process of storing data records across multiple machines a. Sharding b. HDFS c. HIVE d. HBASE Ans. a

C. Always read-only D. Are made read only using the query to the table

d. R

Ans. b

SET II

1. who was the developer of Hadoop language?

A. Apache Software Foundation B. Hadoop Software Foundation C. Sun Microsystems D. Bell Labs View Answer Ans : A

Explanation: Hadoop Developed by: Apache Software Foundation.

2. The hadoop language wriiten in which language?

A. C B. C++ C. Java D. Python View Answer Ans : C

Explanation: The hadoop language Written in: Java. 3. What was the Initial release date of hadoop?

A. 1st April 2007 B. 1st April 2006 C. 1st April 2008 D. 1st April 2005 View Answer Ans : B

Explanation: Initial release: April 1, 2006; 13 years ago. 4. What license is Hadoop distributed under?

A. Apache License 2.1 B. Apache License 2.2 C. Apache License 2.0 D. Apache License 1.0 View Answer Ans : C

Explanation: Hadoop is Open Source, released under Apache 2 license.

5. IBM and ________ have announced a major initiative to use Hadoop to support university courses in distributed computer programming.

A. Google B. Apple C. Facebook D. Microsoft View Answer Ans : A

Explanation: Google and IBM Announce University Initiative to Address Internet-Scale. 6. On which platfrm hadoop langauge runs?

A. Bare metal B. Debian C. Cross-platform D. Unix-Like View Answer Ans : C

Explanation: Hadoop has support for cross platform operating system.

10. Which of the following is not Features Of Hadoop?

A. Suitable for Big Data Analysis B. Scalability C. Robust D. Fault Tolerance View Answer Ans : C

Explanation: Robust is is not Features Of Hadoop.

1. The MapReduce algorithm contains two important tasks, namely __________.

A. mapped, reduce B. mapping, Reduction C. Map, Reduction D. Map, Reduce View Answer Ans : D

A. Map B. Reduce C. Both A and B D. Node View Answer Ans : A

A. Map B. Reduce C. Node D. Both A and B View Answer Ans : B

Explanation: Reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples. 4. In how many stages the MapReduce program executes?

A. 2 B. 3 C. 4 D. 5 View Answer

Ans : B

A. SlaveNode B. MasterNode C. JobTracker D. Task Tracker View Answer Ans : C

Explanation: JobTracker : Schedules jobs and tracks the assign jobs to Task tracker. 6. Which of the following is used for an execution of a Mapper or a Reducer on a slice of data?

A. Task B. Job C. Mapper D. PayLoad View Answer Ans : A

Explanation: Task : An execution of a Mapper or a Reducer on a slice of data. 7. Which of the following commnd runs a DFS admin client?

A. secondaryadminnode B. nameadmin C. dfsadmin D. adminsck View Answer Ans : C

Explanation: dfsadmin : Runs a DFS admin client. 8. Point out the correct statement.

Explanation: This feature of MapReduce is "Data Locality". 9. Although the Hadoop framework is implemented in Java, MapReduce applications need not be written in ____________

A. C B. C# C. Java D. None of the above View Answer

Ans : C

Explanation: Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications (non JNITM based). 10. The number of maps is usually driven by the total size of ____________

A. Inputs B. Output C. Task D. None of the above View Answer Ans : A

Explanation: Total size of inputs means the total number of blocks of the input files. 1. What is full form of HDFS?

A. Hadoop File System B. Hadoop Field System C. Hadoop File Search D. Hadoop Field search View Answer Ans : A

Explanation: Hadoop File System was developed using distributed file system design. 2. HDFS works in a __________ fashion.

A. worker-master fashion B. master-slave fashion C. master-worker fashion D. slave-master fashion View Answer Ans : B

Explanation: HDFS follows the master-slave architecture. 3. Which of the following are the Goals of HDFS?

A. Fault detection and recovery B. Huge datasets C. Hardware at data D. All of the above View Answer Ans : D

Explanation: All the above option are the goals of HDFS. 4. ________ NameNode is used when the Primary NameNode goes down.

A. Rack B. Data C. Secondary D. Both A and B View Answer Ans : C

Explanation: Secondary namenode is used for all time availability and reliability.

5. The minimum amount of data that HDFS can read or write is called a _____________.

A. Datanode B. Namenode C. Block D. None of the above View Answer Ans : C

Explanation: The minimum amount of data that HDFS can read or write is called a Block. 6. The default block size is ______.

A. 32MB B. 64MB C. 128MB D. 16MB View Answer Ans : B

A. Datanode B. Namenode C. Block D. None of the above View Answer Ans : A

Explanation: For every node (Commodity hardware/System) in a cluster, there will be a datanode. 8. Which of the following is not Features Of HDFS?

Explanation: The correct feature is Hadoop provides a command interface to interact with HDFS. 9. HDFS is implemented in _____________ language.

A. Perl B. Python C. Java D. C View Answer Ans : C

Explanation: HDFS is implemented in Java and any computer which can run Java can host a NameNode/DataNode on it.

10. During start up, the ___________ loads the file system state from the fsimage and the edits log file.

A. Datanode B. Namenode C. Block D. ActionNode View Answer Ans : B

Explanation: HDFS is implemented on any computer which can run Java can host a NameNode/DataNode on it. 1. Which of the following is not true about Pig?

Explanation: Pig is generally used with Hadoop; we can perform all the data manipulation operations in Hadoop using Apache Pig. 2. Which of the following is/are a feature of Pig?

A. Rich set of operators B. Ease of programming C. Extensibility D. All of the above View Answer Ans : D

Explanation: All options are the following Features of Pig. 3. In which year apache Pig was released?

A. 2005 B. 2006 C. 2007 D. 2008 View Answer Ans : B

Explanation: In 2006, Apache Pig was developed as a research project. 4. Pig operates in mainly how many nodes?

A. 2 B. 3 C. 4 D. 5 View Answer Ans : A

Explanation: You can run Pig (execute Pig Latin statements and Pig commands) using various mode: Interactive and Batch Mode. 5. Which of the following company has developed PIG?

A. Google B. Yahoo C. Microsoft D. Apple View Answer Ans : B

Explanation: Apache Pig was developed as a research project at Yahoo, especially to create and execute MapReduce jobs on every dataset. 6. Which of the following function is used to read data in PIG?

A. Write B. Read C. Perform D. Load View Answer Ans : D

Explanation: PigStorage is the default load function. 7. __________ is a framework for collecting and storing script-level statistics for Pig Latin.

A. Pig Stats B. PStatistics C. Pig Statistics D. All of the above View Answer Ans : C

Explanation: The new Pig statistics and the existing Hadoop statistics can also be accessed via the Hadoop job history file. 8. Which of the following is true statement?

A. Pig is a high level language. B. Performing a Join operation in Apache Pig is pretty simple. C. Apache Pig is a data flow language. D. All of the above View Answer Ans : D

Explanation: All option are true statement. 9. Which of the following will compile the Pigunit?

A. $pig_trunk ant pigunit-jar B. $pig_tr ant pigunit-jar C. $pig_ ant pigunit-jar D. $pigtr_ ant pigunit-jar View Answer Ans : A

Explanation: The compile will create the pigunit.jar file.

10. Point out the wrong statement.

Explanation: Hive needs a relational database like oracle to perform query operations and store data is incorrect with respect to Hive. 2. Which of the following is not a Features of HiveQL?

A. Supports joins B. Supports indexes C. Support views D. Support Transactions View Answer Ans : D

Explanation: Support Transactions is not a Features of HiveQL. 3. Which of the following operator executes a shell command from the Hive shell?

A. | B. ! C. # D. $ View Answer Ans : B

Explanation: Exclamation operator is for execution of command. 4. Hive uses _________ for logging.

A. logj4 B. log4l C. log4i D. log4j View Answer Ans : D

Explanation: By default Hive will use hive-log4j.default in the conf/ directory of the Hive installation. 5. HCatalog is installed with Hive, starting with Hive release is ___________

A. 0.10.0 B. 0.9.0 C. 0.11.0 D. 0.12.0 View Answer Ans : C

Explanation: hcat commands can be issued as hive commands, and vice versa. 6. _______ supports a new command shell Beeline that works with HiveServer2.

A. HiveServer2 B. HiveServer3 C. HiveServer4 D. HiveServer5 View Answer Ans : A

Explanation: The Beeline shell works in both embedded mode as well as remote mode. 7. The ________ allows users to read or write Avro data as Hive tables.

A. AvroSerde B. HiveSerde C. SqlSerde D. HiveQLSerde View Answer Ans : A

Explanation: AvroSerde understands compressed Avro files. 8. Which of the following data type is supported by Hive?

A. map B. record C. string D. enum View Answer Ans : D

Explanation: Option C is correct.

A. MapReduce programming B. Hive C. RDBMS D. None of the above View Answer Ans : A

Explanation: MapReduce programming is the right answer. 1. which of the following is correct statement?

A. HBase is a distributed column-oriented database B. Hbase is not open source C. Hbase is horizontally scalable. D. Both A and C View Answer Ans : D

A. HBase is lateral scalable. B. It has automatic failure support. C. It provides consistent read and writes. D. It has easy java API for client. View Answer Ans : A

Explanation: Option A is incorrect because HBase is linearly scalable. 3. When did HBase was first released?

A. April 2007 B. March 2007 C. February 2007 D. May 2007 View Answer Ans : C

A. BigTop B. Bigtable C. Scanner D. FoundationDB View Answer Ans : B

A. HTableDescriptor B. HDescriptor C. HTable D. HTabDescriptor View Answer Ans : A

Explanation: Java provides an Admin API to achieve DDL functionalities through programming 6. which of the following is correct statement?

Explanation: All the options are correct. 7. HBase supports a ____________ interface via Put and Result.

A. bytes-in/bytes-out B. bytes-in C. bytes-out D. None of the above View Answer Ans : A

Explanation: Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes. 8. Which command is used to disable all the tables matching the given regex?

A. remove all B. drop all C. disable_all D. None of the above View Answer Ans : C

Explanation: The syntax for disable_all command is as follows : hbase > disable_all 'r.*' 9. _________ is the main configuration file of HBase.

A. hbase.xml B. hbase-site.xml C. hbase-site-conf.xml D. hbase-conf.xml View Answer Ans : B

Explanation: Set the data directory to an appropriate location by opening the HBase home folder in /usr/local/HBase. 10. which of the following is incorrect statement?

A. HBase is built for wide tables B. Transactions are there in HBase. C. HBase has de-normalized data. D. HBase is good for semi-structured as well as structured data. View Answer Ans : B

Explanation: No transactions are there in HBase. 1. R was created by?

A. Ross Ihaka B. Robert Gentleman C. Both A and B D. Ross Gentleman View Answer Ans : C

A. C B. Ruby C. Java D. Basic View Answer Ans : A

A. GNU A B. GNU S C. GNU L D. GNU R View Answer Ans : B

Explanation: R is free software distributed under a GNU-style copy left, and an official part of the GNU project called GNU S. 4. R made its first appearance in?

A. 1992 B. 1995 C. 1993 D. 1994 View Answer

Ans : C

Explanation: R made its first appearance in 1993. 5. Which of the following is true about R?

Explanation: All of the above statement are true. 6. Point out the wrong statement?

Explanation: help command is used for knowing details of particular command in R. 7. Command lines entered at the console are limited to about ________ bytes

A. 4095 B. 4096 C. 4097 D. 4098 View Answer Ans : A

Explanation: Elementary commands can be grouped together into one compound expression by braces (‘{’ and ‘}’). 8. R language is a dialect of which of the following languages?

A. s B. c C. sas D. matlab View Answer Ans : A

A. 3 B. 4 C. 5 D. 6 View Answer

Ans : D

A. .S B. .RP C. .R D. .SP View Answer Ans : C

v <- TRUE

print(class(v))

A. logical B. Numeric C. Integer D. Complex View Answer Ans : A

Explanation: It produces the following result : [1] ""logical""

2. What will be output for the following code?

v <- ""TRUE""

print(class(v))

A. logical B. Numeric C. Integer D. Character View Answer Ans : D

Explanation: It produces the following result : [1] ""character""

3. In R programming, the very basic data types are the R-objects called?

A. Lists B. Matrices

C. Vectors D. Arrays View Answer Ans : C

Explanation: In R programming, the very basic data types are the R-objects called vectors

4. Data Frames are created using the?

A. frame() function B. data.frame() function C. data() function D. frame.data() function View Answer Ans : B

Explanation: Data Frames are created using the data.frame() function 5. Which functions gives the count of levels?

A. level B. levels C. nlevels D. nlevel View Answer Ans : C

Explanation: Factors are created using the factor() function. The nlevels functions gives the count of levels. 6. Point out the correct statement?

Explanation: A vector can only contain objects of the same class. 7. What will be the output of the following R code?

> x <- vector(""numeric"", length = 10)

> x

A. 1 0 B. 0 0 0 0 0 0 0 0 0 0 C. 0 1 D. 0 0 1 1 0 1 1 0 View Answer Ans : B

Explanation: You can also use the vector() function to initialize vectors.

8. What will be output for the following code?

> sqrt(-17)

A. -4.02 B. 4.02 C. 3.67 D. NAN View Answer Ans : D

Explanation: These metadata can be very useful in that they help to describe the object. 9. _______ function returns a vector of the same size as x with the elements arranged in increasing order.

A. sort() B. orderasc() C. orderby() D. sequence() View Answer Ans : A

Explanation: There are other more flexible sorting facilities available like order() or sort.list() which produce a permutation to do the sorting. 10. What will be the output of the following R code?

> m <- matrix(nrow = 2, ncol = 3)

> dim(m)

A. 3 3 B. 3 2 C. 2 3 D. 2 2 View Answer Ans : C

Explanation: Matrices are constructed column-wise. 1. Which loop executes a sequence of statements multiple times and abbreviates the code that manages the loop variable?

A. for B. while C. do-while D. repeat View Answer Ans : D

Explanation: repeat loop : Executes a sequence of statements multiple times and abbreviates the code that manages the loop variable. 2. Which of the following true about for loop?

Explanation: for loop : Like a while statement, except that it tests the condition at the end of the loop body. 3. Which statement simulates the behavior of R switch?

A. Next B. Previous C. break D. goto View Answer Ans : A

Explanation: The next statement simulates the behavior of R switch. 4. In which statement terminates the loop statement and transfers execution to the statement immediately following the loop?

A. goto B. switch C. break D. label View Answer Ans : C

Explanation: Break : Terminates the loop statement and transfers execution to the statement immediately following the loop. 5. Point out the wrong statement?

A. TRUE B. FALSE C. Can be true or false D. Can not say View Answer Ans : A

Explanation: True, The mapply() function is a multivariate apply of sorts which applies a function in parallel over a set of arguments. 7. Which of the following is valid body of split function?

A. function (x, f) B. function (x, f, drop = FALSE, …) C. function (x, drop = FALSE, …) D. function (drop = FALSE, …) View Answer Ans : B

Explanation: x is a vector (or list) or data frame 8. Which of the following character skip during execution?

v <- LETTERS[1:6]

for ( i in v) {

if (i == ""D"") {

}

print(i)

}

A. A B. B C. C D. D View Answer Ans : D

Explanation: When the above code is compiled and executed, it produces the following result : [1] ""A"" [1] ""B"" [1] ""C"" [1] ""E"" [1] ""F""

9. What will be output for the following code?

v <- LETTERS[1]

for ( i in v) {

print(v)

}

A. A B. A B C. A B C D. A B C D View Answer Ans : A

Explanation: The output for the following code : [1] ""A"" 10. What will be output for the following code?

v <- LETTERS[""A""]

for ( i in v) {

print(v)

}

A. A B. NAN C. NA D. Error View Answer Ans : C

Explanation: The output for the following code : [1] NA 1. An R function is created by using the keyword?

A. fun B. function C. declare D. extends View Answer Ans : B

Explanation: An R function is created by using the keyword function. 2. What will be output for the following code?

print(mean(25:82))

A. 1526 B. 53.5 C. 50.5 D. 55 View Answer Ans : B

Explanation: The code will find mean of numbers from 25 to 82 that is 53.5 3. Point out the wrong statement?

A. Functions in R are “second class objects” B. The writing of a function allows a developer to create an interface to the code, that is explicitly specified with a set of parameters

C. Functions provides an abstraction of the code to potential users D. Writing functions is a core activity of an R programmer View Answer Ans : A

Explanation: Functions in R are “first class objects”, which means that they can be treated much like any other R object. 4. What will be output for the following code?

> paste("a", "b", se = ":")

A. a+b B. a:b C. a-b D. None of the above View Answer Ans : D

A. f.tests () B. l.tests () C. t.tests () D. p.tests () View Answer Ans : C

A. NA B. NAN C. 0.213 D. Error View Answer Ans : B

A. Lapply B. Japply C. Vapply D. Zapply View Answer

Ans : C

A. Match() B. Dismatch() C. Mismatch() D. Search() View Answer Ans : A

A. is.null() B. is.nullobj() C. null() D. as.nullobj() View Answer Ans : A

A. Boxplot() B. Text() C. Treat() D. Both A and B View Answer Ans : D

Explanation: In the base graphics system, boxplot or text function is used to add elements to a plot. 1. Which of the following syntax is used to install forecast package?

A. install.pack("forecast") B. install.packages("cast") C. install.packages("forecast") D. install.pack("forecastcast") View Answer Ans : C

Explanation: forecast is used for time series analysis 2. Which splits a data frame and returns a data frame?

A. apply B. ddply

C. stats D. plyr View Answer Ans : B

Explanation: ddply splits a data frame and returns a data frame. 3. Which of the following is an R package for the exploratory analysis of genetic and genomic data?

A. adeg B. adegenet C. anc D. abd View Answer Ans : B

A. accelerometry B. abc C. abd D. anc View Answer Ans : A

A. G.A. B. G2db C. G.S. D. G1DBN View Answer Ans : C

A. stringr B. nbpMatching C. messagewarning D. namespace View Answer Ans : D

install.packages(c("devtools", "roxygen2"))

A. Develops the tools B. Installs the given packages C. Exits R studio D. Nothing happens View Answer Ans : B

A. Double B. Single C. Triple D. No File View Answer Ans : B

A. TRUE B. FALSE C. Can be true or false D. Can not say View Answer Ans : A

Explanation: library() is not useful when developing a package since you have to install the package first. A library is a simple directory containing installed packages.

10. DESCRIPTION uses a very simple file format called DCF.

A. TRUE B. FALSE C. Can be true or false D. Can not say View Answer Ans : A

19. HDFS Stores how much data in each clusters that can be scaled at any time? 1. 32 2. 64 3. 128 4. 256 Show Answer 128

33. _____ provides performance through distribution of data and fault tolerance through replication 1. HDFS 2. PIG 3. HIVE 4. HADOOP Show Answer HDFS

34. ______ is a programming model for writing applications that can process Big Data in parallel on multiple nodes. 1. HDFS 2. MAP REDUCE 3. HADOOP 4. HIVE Show Answer MAP REDUCE

35. ____ takes the grouped key-value paired data as input and runs a Reducer function on each one of them. 1. MAPPER 2. REDUCER 3. COMBINER 4. PARTITIONER Show Answer REDUCER

36. ____ is a type of local Reducer that groups similar data from the map phase into identifiable sets. 1. MAPPER 2. REDUCER 3. COMBINER 4. PARTITIONER Show Answer COMBINER

37. While Installing Hadoop how many xml files are edited and list them ? 1. core-site.xml 2. hdfs-site.xml 3. mapred.xml 4. yarn.xml Show Answer core-site.xml

1.1 Information is a. Data b. Processed Data c. Manipulated input d. Computer output 1.2 Data by itself is not useful unless a. It is massive b. It is processed to obtain information c. It is collected from diverse sources d. It is properly stated 1.3 For taking decisions data must be a Very accurate b Massive c Processed correctly d Collected from diverse sources 1.4 Strategic information is needed for a Day to day operations b Meet government requirements c Long range planning d Short range planning 1.5 Strategic information is required by a Middle managers b Line managers c Top managers d All workers 1.6 Tactical information is needed for a Day to day operations b Meet government requirements c Long range planning d Short range planning 1.7 Tactical information is required by

a Middle managers b Line managers c Top managers d All workers 1.8 Operational information is needed for a Day to day operations b Meet government requirements c Long range planning d Short range planning 1.9 Operational information is required by a Middle managers b Line managers c Top managers d All workers 1.10 Statutory information is needed for a Day to day operations b Meet government requirements c Long range planning d Short range planning 1.11 In motor car manufacturing the following type of information is strategic a Decision on introducing a new model b Scheduling production c Assessing competitor car d Computing sales tax collected 1.12 In motor car manufacturing the following type of information is tactical a Decision on introducing a new model b Scheduling production c Assessing competitor car d Computing sales tax collected

1.13 In motor car manufacturing the following type of information is operational

a Decision on introducing a new model b Scheduling production c Assessing competitor car d Computing sales tax collected 1.14 In motor car manufacturing the following type of information is statutory a Decision on introducing a new model b Scheduling production c Assessing competitor car d Computing sales tax collected 1.15 In a hospital information system the following type of information is strategic a Opening a new children’s ward b Data on births and deaths c Preparing patients’ bill d Buying an expensive diagnostic system such as CAT scan 1.16 In a hospital information system the following type of information is tactical a Opening a new children’s’ ward b Data on births and deaths c Preparing patients’ bill d Buying an expensive diagnostic system such as CAT scan 1.17 In a hospital information system the following type of information is operational a Opening a new children’s’ ward b Data on births and deaths c Preparing patients’ bill d Buying an expensive diagnostic system such as CAT scan 1.18 In a hospital information system the following type of information is statutory a Opening a new children’s’ ward b Data on births and deaths c Preparing patients’ bill d Buying an expensive diagnostic system such as CAT scan

1.19 A computer based information system is needed because (i) The size of organization have become large and data is massive (ii) Timely decisions are to be taken based on available data (iii) Computers are available (iv) Difficult to get clerks to process data a (ii) and (iii) b (i) and (ii) c (i) and (iv) d (iii) and (iv) 1.20 Volume of strategic information is a Condensed b Detailed c Summarized d Irrelevant 1.21 Volume of tactical information is a Condensed b Detailed c Summarized d relevant 1.22 Volume of operational information is a Condensed b Detailed c Summarized d Irrelevant 1.23 Strategic information is a Haphazard b Well organized c Unstructured d Partly structured 1.24 Tactical information is a Haphazard

b Well organized c Unstructured d Partly structured 1.25 Operational information is a Haphazard b Well organized c Unstructured d Partly structured 1.26 Match and find best pairing for a Human Resource Management System (i)Policies on giving bonus (iv)Strategic information (ii)Absentee reduction (v)Tactical information (iii)Skills inventory (vi)Operational Information a (i) and (v) b (i) and (iv) c (ii) and (iv) d (iii) and (v) 1.27 Match and find best pairing for a Production Management System (i) Performance appraisal of machines (iv)Strategic information to decide on replacement (ii)Introducing new production (v)Tactical information technology (iii)Preventive maintenance schedules (vi)Operational information for machines a (i) and (vi) b (ii) and (v) c (i) and (v) d (iii) and (iv) 1.28 Match and find best pairing for a Production Management System (i) Performance appraisal of machines (iv)Strategic information to decide on replacement (ii)Introducing new production (v)Tactical information technology

(iii)Preventive maintenance schedules (vi)Operational information for machines a (iii) and (vi) b (i) and (iv) c (ii) and (v) d None of the above 1.29 Match and find best pairing for a Materials Management System (i) Developing vendor performance (iv) Strategic information measures (ii) Developing vendors for critical (v) Tactical information items (iii)List of items rejected from a vendor (vi)Operational information a (i) and (v) b (ii) and (v) c (iii) and (iv) d (ii) and (vi) 1.30 Match and find best pairing for a Materials Management System (i)Developing vendor performance (iv)Strategic information measures (ii)Developing vendors for critical (v)Tactical information items (iii)List of items rejected from a vendor (vi)Operational information a (i) and (iv) b (i) and (vi) c (ii) and (iv) d (iii) and (v) 1.31 Match and find best pairing for a Materials Management System (i)Developing vendor performance (iv)Strategic information measures (ii)Developing vendors for critical (v)Tactical information items (iii)List of items rejected from a vendor (vi)Operational information a (i) and (vi) b (iii) and (vi) c (ii) and (vi) d (iii) and (iv)

1.32 Match and find best pairing for a Finance Management System (i)Tax deduction at source report (iv)Strategic information (ii)Impact of taxation on pricing (v)Tactical information (iii)Tax planning (vi)Operational information a (i) and (v) b (iii) and (vi) c (ii) and (v) d (ii)) and (iv) 1.33 Match and find best pairing for a Finance Management System (i)Budget status to all managers (iv)Strategic information (ii)Method of financing (v)Tactical information (iii)Variance between budget and (vi)Operational information expenses a (i) and (v) b (iii) and (vi) c (ii) and (v) d (ii) and (iv) 1.34 Match and find best pairing for a Marketing Management System (i)Customer preferences surveys (iv)Strategic information (ii)Search for new markets (v)Tactical information (iii)Performance of sales outlets (vi)Operational information a (i) and (iv) b (ii) and (v) c (iii) and (vi) d (ii) and (v) 1.35 Match and find best pairing for a Marketing Management System (i)Customer preferences surveys (iv)Strategic information (ii)Search for new markets (v)Tactical information (iii)Performance of sales outlets (vi)Operational information a (iii) and (iv) b (i) and (vi) c (i) and (v)

d (iii) and (v) 1.36 Match and find best pairing for a Research and Development Management System (i)Technical collaboration decision (iv)Strategic information (ii)Budgeted expenses Vs actuals (v)Tactical information (iii)Proportion of budget to be (vi)Operational information allocated to various projects a (i) and (iv) b (ii) and (v) c (iii) and (vi) d (iii) and (iv) 1.37 Match and find best pairing for a Research and Development Management System (i)Technical collaboration decision (iv)Strategic information (ii)Budgeted expenses Vs actuals (v)Tactical information (iii)Proportion of budget to be (vi)Operational information allocated to various projects a (i) and (v) b (iii) and (v) c (ii) and (v) d (i) and (vi) 1.38 Organizations are divided into departments because a it is convenient to do so b each department can be assigned a specific functional responsibility c it provides opportunities for promotion d it is done by every organization 1.39 Organizations have hierarchical structures because a it is convenient to do so b it is done by every organization c specific responsibilities can be assigned for each level d it provides opportunities for promotions

1.40 Which of the following functions is the most unlikely in an insurance company. a Training b giving loans c bill of material d accounting 1.41 Which of the following functions is most unlikely in a university a admissions b accounting c conducting examination d marketing 1.42 Which of the following functions is most unlikely in a purchase section of an organization. a Production planning b order processing c vendor selection d training 1.43 Which is the most unlikely function of a marketing division of an organization. a advertising b sales analysis c order processing d customer preference analysis 1.44 Which is the most unlikely function of a finance section of a company. a Billing b costing c budgeting d labor deployment 1.45 Match quality of information and how it is ensured using the following list QUALITY HOW ENSURED (i) Accurate (iv) Include all data

(ii) Complete (v) Use correct input and processing rules (iii)Timely (vi) Include all data up to present time a (i) and (v) b (ii) and (vi) c (iii) and (vi) d (i) and (iv) 1.46 Match quality of information and how it is ensured using the following list QUALITY HOW ENSURED (i) Accurate (iv) Include all data (ii) Complete (v) Use correct input and processing rules (iii) Timely (vi) Include all data up to present time a (ii) and (v) b (ii) and (vi) c (ii) and (iv) d (iii) and (iv) 1.47 Match quality of information and how it is ensured using the following list

QUALITY HOW ENSURED (i) Up-to-date (iv) Include all data to present time (ii) Brief (v) Give at right time (iii) Significance (vi) Use attractive format and understandable graphical charts

a (i) and (v) b (ii) and (vi) c (iii) and (vi) d (i) and (vi) 1.48 Match quality of information and how it is ensured using the following list QUALITY HOW ENSURED (i)Up- to-date (iv) Include all data to present time (ii)Brief (v) Give at right time

(iii) Significance (vi) Use attractive format and understandable graphical charts a (i) and (iv) b (ii) and (v) c (iii) and (iv) d (ii) and (iv) 1.49 Match quality of information and how it is ensured using the following list QUALITY HOW ENSURED (i)Brief (iv) Unpleasant information not hidden (ii)Relevant (v) Summarize relevant information (iii) Trustworthy (vi) Understands user needs a (i) and (iv) b (ii) and (v) c (iii) and (vi) d (i) and (v) 1.50 Match quality of information and how it is ensured using the following list QUALITY HOW ENSURED (i)Brief (iv) Unpleasant information not hidden (ii)Relevant (v) Summarize relevant information (iii)Trustworthy (vi) Understands user needs a (ii) and (vi) b (i) and (iv) c (iii) and (v) d (ii) and (iv) 1.51 The quality of information which does not hide any unpleasant information is known as a Complete b Trustworthy c Relevant d None of the above 1.52 The quality of information which is based on understanding user needs

a Complete b Trustworthy c Relevant d None of the above 1.53 Every record stored in a Master file has a key field because a it is the most important field b it acts as a unique identification of record c it is the key to the database d it is a very concise field 1.54 The primary storage medium for storing archival data is a floppy disk b magnetic disk c magnetic tape d CD- ROM 1.55 Master files are normally stored in a a hard disk b a tape c CD – ROM d computer’s main memory 1.56 Master file is a file containing a all master records b all records relevant to the application c a collection of data items d historical data of relevance to the organization 1.57 Edit program is required to a authenticate data entered by an operator b format correctly input data c detect errors in input data d expedite retrieving input data 1.58 Data rejected by edit program are a corrected and re- entered

b removed from processing c collected for later use d ignored during processing 1.59 Online transaction processing is used because a it is efficient b disk is used for storing files c it can handle random queries. d Transactions occur in batches 1.60 On-line transaction processing is used when i) it is required to answer random queries ii) it is required to ensure correct processing iii) all files are available on-line iv) all files are stored using hard disk a i ,ii b i, iii c ii ,iii, iv d i , ii ,iii 1.61 Off-line data entry is preferable when i) data should be entered without error ii) the volume of data to be entered is large iii) the volume of data to be entered is small iv) data is to be processed periodically a i, ii b ii, iii c ii, iv d iii, iv 1.62 Batch processing is used when i) response time should be short ii) data processing is to be carried out at periodic intervals iii) transactions are in batches iv) transactions do not occur periodically

a i ,ii b i ,iii,iv c ii ,iii d i , ii ,iii 1.63 Batch processing is preferred over on-line transaction processing when i) processing efficiency is important ii) the volume of data to be processed is large iii) only periodic processing is needed iv) a large number of queries are to be processed a i ,ii b i, iii c ii ,iii d i , ii ,iii 1.64 A management information system is one which a is required by all managers of an organization b processes data to yield information of value in tactical management c provides operational information d allows better management of organizations 1.65 Data mining is used to aid in a operational management b analyzing past decision made by managers c detecting patterns in operational data d retrieving archival data 1.66 Data mining requires a large quantities of operational data stored over a period of time b lots of tactical data c several tape drives to store archival data d large mainframe computers 1.67 Data mining can not be done if a operational data has not been archived b earlier management decisions are not available

c the organization is large d all processing had been only batch processing 1.68 Decision support systems are used for a Management decision making b Providing tactical information to management c Providing strategic information to management d Better operation of an organization 1.69 Decision support systems are used by a Line managers. b Top-level managers. c Middle level managers. d System users 1.70 Decision support systems are essential for a Day–to-day operation of an organization. b Providing statutory information. c Top level strategic decision making. d Ensuring that organizations are profitable.

Key to Objective Questions

1.1 b 1.2 b 1.3 c 1.4 c 1.5 c 1.6 d

1.7 a 1.8 a 1.9 b 1.10 b 1.11 a 1.12 c

1.13 b 1.14 d 1.15 d 1.16 a 1.17 c 1.18 b

1.19 b 1.20 a 1.21 c 1.22 b 1.23 c 1.24 d

1.25 b 1.26 b 1.27 c 1.28 a 1.29 a 1.30 c

1.31 b 1.32 c 1.33 d 1.34 c 1.35 c 1.36 a

1.37 b 1.38 b 1.39 c 1.40 c 1.41 d 1.42 a

1.43 c 1.44 d 1.45 a 1.46 c 1.47 c 1.48 a

1.49 d 1.50 a 1.51 b 1.52 c 1.53 b 1.54 c

1.55 a 1.56 b 1.57 c 1.58 a 1.59 c 1.60 b

1.61 c 1.62 c 1.63 d 1.64 b 1.65 c 1.66 a

1.67 a 1.68 c 1.69 b 1.70 c

UNIT 1 DataAnalytics(KIT601)

1. The data with no pre-defined organizational form or specific format is

a. Semi-structured data b. Unstructured data c. Structured data d. None of these

Ans. b

a. Categorical data b. Interval data c. Ordinal data d. Ratio data

Ans. c

2. The data which can be ordered or ranked according to some relationship to one another is

Ans. b

Ans. a

Ans. c

Ans. d

6. Business domain expertise with deep understanding of the data, KPIs, key metrics and business intelligence from a reporting perspective is key role of ____________.

a. Business User b. Project Sponsor c. Business Intelligence Analyst d. Data Engineer

7. _____________ is concerned with uncertainty or inaccuracy of the data.

a. Volume b. Velocity c. Variety d. Veracity

Ans. d

Ans. True

11. The process of exploring data and reports in order to extract meaningful insights, which can be used to better understand and improve business performance.

a. Reporting b. Analysis c. Summarizing d. None of these

Ans. b

Ans. a

8. What are the V’s in the characteristics of Big data? a. Volume b. Velocity c. Variety d. All of these

9. What are the types of reporting in data analytics?

a. Canned reports b. Dashboard reports c. Alert reports d. All of above

10.Massive Parallel Processing (MPP) database breaks the data into independent chunks with independent disk and CPU resources.

a. True b. False

12. The key components of an analytical sandbox are: (i) Business analytics (ii) Analytical sandbox platform (iii) Data access and delivery (iv) Data sources

a. True b. False

Ans. b

a. Data preparation b. Discovery c. Data Modelling d. Data Building Ans. a

Ans.b

Ans. a

15. Which phase uses SQL, Python, R, or excel to perform various data modifications and transformations.

a. Data preparation b. Data cleaning c. Data Modelling d. Data Building

16. By definition, Database Administrator is a person who ___________

Ans. a

Ans. c

Ans. b

Ans .b

17. ETL stands for

a. Extract, Load, Transform b. Evaluate, Transform ,Load c. Extract , Loss , Transform d. None of the above

a. Data preparation b. Discovery c. Data Modelling d. Data Building

19. Which of the following is not a major data analysis approaches?

a. Data Mining b. Predictive Intelligence c. Business Intelligence d. Text Analytics

20. User rating given to a movie in a scale 1-10, can be considered as an attribute of type?

a. Nominal b. Ordinal c. Interval d. Ratio

Ans. d

22. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

a. TRUE b. FALSE c. Can be true or false d. Cannot say

Ans. a

Ans. b

Ans.b

25. The Process of describing the data that is huge and complex to store and process is known as

a. Analytics b. Data mining c. Big Data d. Data Warehouse

21. Data Analysis is defined by the statistician?

a. William S. b. Hans Peter Luhn c. Gregory Piatetsky-Shapiro d. John Tukey

23. Which of the following is not a major data analysis approaches?

a. Data Mining b. Predictive Intelligence c. Business Intelligence d. Text Analytics

24. Which of the following step is performed by data scientist after acquiring the data?

a. Data Cleansing b. Data Integration c. Data Replication d. All of the mentioned

Ans. c

26. Data generated from online transactions is one of the example for volume of big data. Is this true or False. a. TRUE b. FALSE

Ans. a

27. Velocity is the speed at which the data is processed

a. TRUE b. FALSE

Ans. b

28. _____________ have a structure but cannot be stored in a database.

a. Structured b. Semi-Structured c. Unstructured d. None of these

Ans. b

29. ____________refers to the ability to turn your data useful for business.

a. Velocity b. Variety c. Value d. Volume

Ans. c

30. Value tells the trustworthiness of data in terms of quality and accuracy.

a. TRUE b. FALSE

Ans.b

NPTEL Questions

31. Analysing the data to answer why some phenomenon related to learning happened is a type of

a. Descriptive Analytics b. Diagnostic Analytics

c. Predictive Analytics d. Prescriptive Analytics

Ans. B

32. Analysing the data to answer what will happen next is a type of

a. Descriptive Analytics b. Diagnostic Analytics c. Predictive Analytics d. Prescriptive Analytics

Ans. D

33. Learning analytics at institutions/University, regional or national level is termed as

a. Educational data mining b. Business intelligence c. Academic analytics d. None of the above

Ans. C

34. Which of the following questions is not a type of Predictive Analytics?

Ans A,D

35. A courses instructor has data about students attendance in her course in the past semester. Based on this data, she constructs a line graph type of analytics is she doing?

a. Descriptive Analytics b. Diagnostic Analytics c. Predictive Analytics d. Prescriptive Analytics

Ans. A

36. she then correlates the attendance with their final exam scores. She realizes that students who score 90% and above also have an attandence of more then 75%. What type of analytics is she doing?

a. Descriptive Analytics b. Diagnostic Analytics c. Predictive Analytics d. Prescriptive Analytics

Ans. B

38. Why one should not go for sampling?

a. Less costly to administer than a census. b. The person authorizing the study is comfortable with the sample. c. Because the research process is sometimes destructive d. None of the above

Ans. d

39. Stratified random sampling is a method of selecting a sample in which:

Ans. c

SET II

1. Data Analysis is defined by the statistician?

e. William S. f. Hans Peter Luhn g. Gregory Piatetsky-Shapiro h. John Tukey

Ans D

2. What is classification?

a) deciding what features to use in a pattern recognition problem b) deciding what class an input pattern belongs to c) deciding what type of neural network to use d) none of the mentioned

Ans. B

3. Data in ___________ bytes size is called Big Data.

A. Tera B. Giga C. Peta D. Meta

Ans : C

Explanation: data in Peta bytes i.e. 10^15 byte size is called Big Data. 4. How many V's of Big Data

A. 2 B. 3 C. 4 D. 5

Ans : D

Explanation: Big Data was defined by the “3Vs” but now there are “5Vs” of Big Data which are Volume, Velocity, Variety, Veracity, Value

5. Transaction data of the bank is?

A. structured data B. unstructured datat C. Both A and B D. None of the above

Ans : A

Explanation: Data which can be saved in tables are structured data like the transaction data of the bank. 6. In how many forms BigData could be found?

A. 2 B. 3 C. 4 D. 5

Ans : B

Explanation: BigData could be found in three forms: Structured, Unstructured and Semistructured. 7. Which of the following are Benefits of Big Data Processing?

A. Businesses can utilize outside intelligence while taking decisions B. Improved customer service C. Better operational efficiency D. All of the above

Ans : D

Explanation: All of the above are Benefits of Big Data Processing.

8. Which of the following are incorrect Big Data Technologies?

A. Apache Hadoop B. Apache Spark C. Apache Kafka D. Apache Pytarch

Ans : D

Explanation: Apache Pytarch is incorrect Big Data Technologies. 9. The overall percentage of the world’s total data has been created just within the past two years is ?

A. 80% B. 85% C. 90% D. 95%

Ans : C

Explanation: The overall percentage of the world’s total data has been created just within the past two years is 90%. 10. Apache Kafka is an open-source platform that was created by?

A. LinkedIn B. Facebook

C. Google D. IBM

Ans : A

Explanation: Apache Kafka is an open-source platform that was created by LinkedIn in the year 2011.

11. What was Hadoop named after?

A. Creator Doug Cutting’s favorite circus act B. Cuttings high school rock band C. The toy elephant of Cutting’s son D. A sound Cutting’s laptop made during Hadoop development

Ans : C

Explanation: Doug Cutting, Hadoop creator, named the framework after his child’s stuffed toy elephant. 12. What are the main components of Big Data?

A. MapReduce B. HDFS C. YARN D. All of the above

Ans : D

Explanation: All of the above are the main components of Big Data.

13. Point out the correct statement.

Ans : B

Explanation: Hadoop batch processes data distributed over a number of computers ranging in 100s and 1000s. 14. Which of the following fields come under the umbrella of Big Data?

A. Black Box Data B. Power Grid Data

C. Search Engine Data D. All of the above

Ans : D

Explanation: All options are the fields come under the umbrella of Big Data.

15. Which of the following is not an example of Social Media? 1. Twitter 2. Google 3. Instagram 4. Youtube

ANs: 2 (Google)

16. By 2025, the volume of digital data will increase to 1. TB 2. YB 3. ZB 4. EB Ans: 3 ZB

17. Data Analysis is a process of 1. inspecting data 2. cleaning data 3. transforming data 4. All of Above

Ans. 4 All of above

18. Which of the following is not a major data analysis approaches? 1. Data Mining 2. Predictive Intelligence 3. Business Intelligence 4. Text Analytics

Ans. 2 Predictive Intelligence

19. The Process of describing the data that is huge and complex to store and process is known as 1. Analytics 2. Data mining 3. Big data 4. Data warehouse

Ans. 3 Big data

20. In descriptive statistics, data from the entire population or a sample is summarized with ?

1. Integer descriptor 2. floating descriptor 3. numerical descriptor 4. decimal descriptor

Ans. 3 numerical descriptor

21. Data generated from online transactions is one of the example for volume of big data 1. TRUE 2. FALSE

TRUE

22. Velocity is the speed at which the data is processed 1. True 2. False

False

23. Value tells the trustworthiness of data in terms of quality and accuracy 1. TRUE 2. FALSE

False

24. Hortonworks was introduced by Cloudera and owned by Yahoo 1. True 2. False

False

25. ____ refers to the ability to turn your data useful for business 1. Velocity 2. variety 3. Value 4. Volume

Ans. 3 Value

26. Data Analysis is defined by the statistician? 1. William S. 2. Hans Peter Luhn 3. Gregory Piatetsky-Shapiro 4. John Tukey

Ans. 4 John Tukey

27. Files are divided into ____ sized Chunks. 1. Static 2. Dynamic 3. Fixed 4. Variable

Ans. 3 Fixed

28. _____ is an open source framework for storing data and running application on clusters of commodity hardware. 1. HDFS 2. Hadoop 3. MapReduce 4. Cloud

Ans. 2 Hadoop

29. ____ is factors considered before Adopting Big Data Technology 1. Validation 2. Verification 3. Data 4. Design

Ans. 1 Validation

30. Which among the following is not a Data mining and analytical applications? 1. profile matching 2. social network analysis 3. facial recognition 4. Filtering

Ans. 4 Filtering

31. Which storage subsystem can support massive data volumes of increasing size. 1. Extensibility 2. Fault tolerance 3. Scalability 4. High-speed I/O capacity

Ans. 3 Scalability

32. ______ is a programming model for writing applications that can process Big Data in parallel on multiple nodes.

1. HDFS 2. MAP REDUCE 3. HADOOP 4. HIVE Ans. MAP REDUCE

33. How many main statistical methodologies are used in data analysis?

A. 2 B. 3 C. 4 D. 5

Ans : A

Explanation: In data analysis, two main statistical methodologies are used Descriptive statistics and Inferential statistics.

34. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

A. TRUE B. FALSE C. Can be true or false D. Can not say

Ans : A

Explanation: The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

35. The branch of statistics which deals with development of particular statistical methods is classified as 1. industry statistics 2. economic statistics 3. applied statistics 4. applied statistics

Ans. applied statistics

36. Point out the correct statement. a) Descriptive analysis is first kind of data analysis performed b) Descriptions can be generalized without statistical modelling

c) Description and Interpretation are same in descriptive analysis d) None of the mentioned

Answer: b Explanation: Descriptive analysis describe a set of data.

37. What are the five V’s of Big Data?

A. Volume

B. Velocity

C. Variety

D. All the above

Answer: Option D

38. What are the main components of Big Data?

A. MapReduce

B. HDFS

C. YARN

D. All of these

Answer: Option D

39. What are the different features of Big Data Analytics?

A. Open-Source

B. Scalability

C. Data Recovery

D. All the above

Answer: Option D

40. Which of the following refers to the problem of finding abstracted patterns (or structures) in the unlabeled data?

A. Supervised learning

B. Unsupervised learning

C. Hybrid learning

D. Reinforcement learning

Answer: B

Explanation: Unsupervised learning is a type of machine learning algorithm that is generally used to find the hidden structured and patterns in the given unlabeled data.

41. Which one of the following refers to querying the unstructured textual data?

A. Information access

B. Information update

C. Information retrieval

D. Information manipulation

Answer: D

42. For what purpose, the analysis tools pre-compute the summaries of the huge amount of data?

A. In order to maintain consistency

B. For authentication

C. For data access

D. To obtain the queries response

Answer: d

43. Which one of the following statements is not correct about the data cleaning?

It refers to the process of data cleaning

It refers to the transformation of wrong data into correct data

It refers to correcting inconsistent data

All of the above

Answer: d

44. Any data with unknown form or the structure is classified as _ data. a. Structured b. Unstructured c. Semi-structured d. None of above Ans. b

45.____ means relating to the issuing of reports. a. Analysis b. Reporting c. Reporting and Analysis d. None of the above

Ans. b

46.Veracity involves the reliability of the data this is ________due to the numerous data sources of big data a) Easy and difficulty b) Easiness c) Demanding d) none of these

Ans. d

48. _____data is data whose elements are addressable for effective analysis.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. a

49. ______data is information that does not reside in a relational database but that have some organizational properties that make it easier to analyze.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. b

50. ______data is a data which is not organized in a predefined manner or does not have a predefined data model, thus it is not a good fit for a mainstream relational database.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. c

51. There are ___ types of big data.

a. 2 b. 3 c. 4 d. 5

Ans. b

52. Google search is an example of _________ data.

a. Structured b. Semi-structured c. Unstructured d. None of the above

Ans. c

UNIT 2 DataAnalytics(KIT601)