comp565_fall2023_A1
.pdf
keyboard_arrow_up
School
McGill University *
*We aren’t endorsed by this school
Course
565
Subject
Computer Science
Date
Dec 6, 2023
Type
Pages
4
Uploaded by stephenlu2002 on coursehero.com
Assignment 1
COMP 565 ML in Genomics and Healthcare
This assignment is worth 8% of your total grade and due at
midnight on September 25, 2023
Question 1 [2%] Implementing LD score regression
For a phenotype of interest, we have collected the marginal statistics
˜
β
for
M
= 4268
SNPs and
the
M
×
M
LD matrix
R
(i.e., pairwise SNP-SNP Pearson correlation). The marginal statistics
are based on
N
= 1000
individuals. Download the marginal statistics and LD matrix from here:
https://drive.google.com/drive/folders/1tq4bTdbsv1iwO4wHxq1smzoN9D5luapp?usp=sharing
For this question, you may also assume there is no population stratification in this dataset. Both
phenotype and genotype were standardized.
Implement the very basic LD score regression algorithm with a programming language of your
choice (preferably Python or R) to estimate the heritability of the phenotype.
What’s your estimate of the heritability?
Submit your answer to this question in iPython notebook with name
COMP565
A1
ldsr.ipynb
or R Markdown
COMP565
A1
ldsr.Rmd
on MyCourses. This way the TA can run your code to
validate its output. Do not submit the data provided to you as long as you have the clear path to
the data you run.
Question 2 [6%] Bayesian fine-mapping
For a phenotype of interest, we have identified a GWAS locus based on N=498 individuals,
which harbour 100 SNPs. As shown in Figure 1, because of the extensive LD, identifying the
1
Figure 1: Manhattan plot for the GWAS locus to finemap. The causal SNPs are in fact coloured
in red although in practice we will know which SNPs are causal.
causal SNPs based on the p-values of the z-scores alone is error prone. Because this is an as-
signment, I have highlighted the causal SNPs namely rs10104559, rs1365732, rs12676370 but
of course in real world applications, we will not know them.
Download the marginal z-score and LD matrix from here:
https://drive.google.com/drive/folders/1tr7BCceyIcKxiO_i6iCNjvk44HHpImgG?usp=sharing
Your task is to implement a simplified version of the FINEMAP algorithm discussed in Lecture
5. To make the task easier, you may assume there are maximum 3 causal SNPs in the locus.
You can divide the tasks into four small tasks:
2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Single Point based Search:
Fair share problem: Given a set of N positive integers S={x1, x2, x3,…, xk,… xN}, decide whether S can be partitioned into two sets S0 and S1 such that the sum of numbers in S0 equals to the sum of numbers in S1. This problem can be formulated as a minimisation problem using the objective function which takes the absolute value of the difference between the sum of elements in S0 and the sum of elements in S1. Assuming that such a partition is possible, then the minimum for a given problem instance would have an objective value of 0. A candidate solution can be represented using a binary array r=[b1, b2, b3,…, bk,… bN], where bk is a binary variable indicating which set the k-th number in S is partitioned into, that is, if bk =0, then the k-th number is partitioned in to S0, otherwise (which means bk =1) the k-th number is partitioned in to S1. For example, given the set with five integers S={4, 1, 2, 2, 1}, the solution [0,1,0,1,1] indicates that S is…
arrow_forward
Do the follow by using jupyter notebook.
Implement a function to solve the multilinear regression problem for a given vector y of dependent values and a matrix X of independent values. Your function should return the least-squares solution for the parameter vector, βˆ. Hint: Be sure to add a column of all 1’s to your X matrix for the intercept term. Hint 2: See the SVD example code for matrix operations using numpy. In addition to those, you will need to perform a matrix inverse, which you can do with numpy.linalg.solve
arrow_forward
We have a corpus and the total number of documents within is 1, The following words occur in the following number of documents:
”machine” occurs in 32 documents
”learning” occurs in 16 documents
”software” occurs in 8 documents
”computer” occurs in 64 documents
”robust” occurs in 1,024 documents
Please calculate the TF-IDF weighted term vector for the following document D. Assume that the log in the IDF weight is taken to the base 2. (Hint: all the numbers above are powers of 2).
”machine learning software robust computer software”
arrow_forward
We have a corpus and the total number of documents within is 1, The following words occur in the following number of documents:
”machine” occurs in 32 documents
”learning” occurs in 16 documents
”software” occurs in 8 documents
”computer” occurs in 64 documents
”robust” occurs in 1,024 documents
Please calculate the TF-IDF weighted term vector for the following document D. Assume that the log in the IDF weight is taken to the base 2. (Hint: all the numbers above are powers of 2).
Document D: "machine learning software robust computer software"
arrow_forward
Question 2) We have N jobs and N workers to do these jobs. It is known at what cost each worker will do each job (as a positive numerical value). We want to assign jobs to workers in such a way that the total cost of completion of all jobs is minimal among other possible alternative assignments. For this problem, write the algorithm as pseudocode, whose input is a matrix representing worker/job costs, and the output is a list of tuples showing which work will be done by which worker, and that tries to reach the solution with GREEDY technique. Explain in what sense your algorithm exhibits greedy behavior. What is the time complexity of your algorithm? Interpret if your algorithm always produces the best (optimum) result for each instance of the problem.
arrow_forward
Answer the following:
This problem exercises the basic concepts of game playing, using tic-tac-toe (noughts and crosses) as an example. We define Xn as the number of rows, columns, or diagonals with exactly n X’s and no O’s. Similarly, On is the number of rows, columns, or diagonals with just n O’s. The utility function assigns +1 to any position with X3=1 and −1 to any position with O3=1. All other terminal positions have utility 0. For nonterminal positions, we use a linear evaluation function defined as Eval(s)=3X2(s)+X1(s)−(3O2(s)+O1(s)).
a. Show the whole game tree starting from an empty board down to depth 2 (i.e., one X and one O on the board), taking symmetry into account.
b. Mark on your tree the evaluations of all the positions at depth 2.
c .Using the minimax algorithm, mark on your tree the backed-up values for the positions at depths 1 and 0, and use those values to choose the best starting move.
Provide original solutions including original diagram for part a!
arrow_forward
Answer the following:
This problem exercises the basic concepts of game playing, using tic-tac-toe (noughts and crosses) as an example. We define Xn as the number of rows, columns, or diagonals with exactly n X’s and no O’s. Similarly, On is the number of rows, columns, or diagonals with just n O’s. The utility function assigns +1 to any position with X3=1 and −1 to any position with O3=1. All other terminal positions have utility 0. For nonterminal positions, we use a linear evaluation function defined as Eval(s)=3X2(s)+X1(s)−(3O2(s)+O1(s)).
a. Show the whole game tree starting from an empty board down to depth 2 (i.e., one X and one O on the board), taking symmetry into account.
b. Mark on your tree the evaluations of all the positions at depth 2.
c .Using the minimax algorithm, mark on your tree the backed-up values for the positions at depths 1 and 0, and use those values to choose the best starting move.
Provide original solution!
arrow_forward
Step 1. Intersection over Union
# def intersection_over_union(dt_bbox, gt_bbox): ---> return iou
Step 2. Evaluate Sample
We now have to evaluate the predictions of the model. To do this, we will write a function that will do the following:
Take model predictions and ground truth bounding boxes and labels as inputs.
For each bounding box from the prediction, find the closest bounding box among the answers.
For each found pair of bounding boxes, check whether the IoU is greater than a certain threshold iou_threshold. If the IoU exceeds the threshold, then we consider this answer as True Positive.
Remove a matched bounding box from the evaluation.
For each predicted bounding box, return the detection score and whether we were able to match it or not.
def evaluate_sample(target_pred, target_true, iou_threshold=0.5):
# ground truth
gt_bboxes = target_true['boxes'].numpy()
gt_labels = target_true['labels'].numpy()
# predictions
dt_bboxes =…
arrow_forward
Weighted Interval Scheduling & Dynamic Programming (Knapsack, Edit Distance)
Suppose you are in the middle of a pandemic.
Given a list of daily case counts to analyze, one would like to identify periods of high growth in the cases.
One way to do is to look at the change in new cases from day to day. For example, suppose we have the following data: (picture)
We would like to identify the period of maximal growth. In the case above, such a period would be from Days 3 through 6, which has net growth of 47 cases.
Give an algorithm in pseudocode that, when given a list of daily "changes" in case rates, identifies the period of maximal growth. Give proofs of correctness and running time for your algorithm.
arrow_forward
(Code in R language)
Consider the data presented in the Trades.csv file (table given below). This file represents 35 days worth of data from a brokerage house that is trying to predict the number of trade executions per day as a function of the number of incoming phone calls to the
Set up a scatterplot of the
Determine the fitted regression equation for this data, and use it to predict the number of trade executions that will occur if there are 2300 incoming calls to the
If the firm receives 100 more calls on Tuesday than they did on Monday, how many more executions should they expect?
Suppose the CFO asks you to predict the number of executions when there are 3500 incoming calls. What should you say?
(Trades.csv file)
Day
Calls
Executions
1
2591
417
2
2146
321
3
2185
362
4
2245
364
5
2600
442
6
2510
386
7
2394
370
8
2486
376
9
2483
463
10
2297
389
11
2106
302
12
2035
266
13
1936
339
14
1951
369
15
2292
403
16
2094
319
17
1897
306
18
2237
397…
arrow_forward
Predicting Housing Median Prices. – The file BostonHousing.csv contains information on 506 census tracts in Boston, where for each tract multiple variables are recorded. The last column (CAT.MEDV) was derived from MEDV, such that it obtains the value 1 if MEDV > 30 and 0 otherwise. First, consider the goal of predicting the median value (MEDV) of a tract, given the information in the first 12 columns. Second, consider the goal of classifying the property using the last column of CAT.MEDV.
Partition the data into training (60%) and validation (40%) sets.
a1. Perform a knn prediction with all 12 predictors (columns 1 – 12) with MEDV (column 13) as the outcome variable. (Ignore the CAT.MEDV column in this step.) Try values of k from 1 to 10. Make sure to normalize the data (preprocess), and choose function knn() from the class package/library rather than FNN. [To make sure R is using class package (when both packages are loaded), use class::knn().] What is the best k? What does it…
arrow_forward
Ma1.
1) On a Bank Reconciliation, if our check was written for $492.83 and was processed as such by the bank, but had been shown in our company's accounting records as a check for $498.23, we would code this as a C+ item.T
rue or False
2) In the Bottom-Up method of calculating required revenue, we treat the amount of desired net income (once we have calculated how much it should be) as:a.
a variable cost.
b. a step cost.
c. unnecessary for the calculation.
d. a fixed item.
e. none of the above.
3) A large F variance from budget in a revenue item should be investigated.
True or False
4) If a Bank Reconciliation cannot be made to balance, then something unusual has occurred and must be investigated.
True or False
5) In preparing a bank reconciliation, we will code an NSF check (using the fabulous Bessner system) as:
a. a C+ item.
b. a C- item.
c. a B+ item.
d. a B- item.
e. none of the above.
6) If a company wants to end up with an AFTER-TAX profit of $25,000, and its tax rate is 38%,…
arrow_forward
here is myLinReg needed to solve this problem
function [a,E] = myLinReg(x,y)
% [a,E] = myLinReg(x,y)
% calculate the linear least squares regression to data given in x,y
% Input
% x: column vector of measured x data to fit
% y: column vector of measured y data to fit
% Output
% a: vector of coefficients for the linear fit y = a(1)+a(2)*x
% E: error of the fit = sum of the residual square
% define a as a 2 entry vector
a = zeros(2,1);
n = length(x); % determine number of data points
if n ~= length(y)
fprintf ('Error: the length of data vectors x and y must be the same\n')
a(:) = realmax(); E = realmax(); % set a and E to real max
return
end
% calculate and store sum terms
Sx = sum(x); Sy = sum(y);
Sxx = sum(x.*x); Sxy = sum(x.*y);
% Calculate linear equation coefficients
a(1) = (Sxx*Sy-Sxy*Sx)/(n*Sxx-Sx*Sx); % a0 coefficient
a(2) = (n*Sxy-Sx*Sy)/(n*Sxx-Sx*Sx); % a1 coefficient
% Calculate the error of the fit
E = sum((y-(a(2)*x+a(1))).^2);
end
arrow_forward
Suppose that a manufacturing company builds n different types of robots, sayrobots 1, 2, . . . , n. These robots are made from a common set of m types of materials, saymaterials 1, 2, . . . , m. The company has only a limited supply of materials for each year,the amount of materials 1, 2, . . . , m are limited by the numbers b1, b2, . . . , bm, respectively.Building robot i requires an aij amount from material j. For example, building robot 1requires a11 from material 1, a12 from material 2, etc. Suppose the profit made by sellingrobot i is pi.
Write an integer linear program for maximizing the annual profit for thecompany
arrow_forward
GD algorithm
Consider Linear Regression with single variable (univariate) problem.
What will be the (approximate if can’t say accurately) values of derivatives of cost/loss function ‘J’ w.r.t. all the parameters by considering one at a time, and why?
What is the significance and/or usage of these θj* for the cost function ‘J’ and hypothesis ‘h’?
Given a dataset where first column is the label ‘y’ while other columns represent factors ‘xi’ as follows:
X = [ 1 0 1
0 1 0 ]
Using GD algorithm, find the linear model. Show all the calculations
arrow_forward
Suppose there is class of 20 students. The university has decided to give the grace for students those who have the CGPA between4.5 to 4.9 to make it 5. Identify the students those have CGPA 5.0 after adding the grace marks. Suppose students have their Roll numbers ranging from 0 to 19 & CGPA between 0-10. Add the grace CGPA to the obtained CGPA of student by 0.1 to 0.5 points. The CGPA should be assigned through random function.
Input Format
The input should contain an array of CGPA of the students.
Constraints
CGPA must lies between 1.0 to 10.0 otherwise prints "invalid input"
Output Format
For each test case, display the roll number and increased CGPA of those students only who lies between the obtained CGPA of 4.5-4.9.
Solve this question using python program.
arrow_forward
Suppose there is class of 20 students. The university has decided to give the grace for students those who have the CGPA between4.5 to 4.9 to make it 5. Identify the students those have CGPA 5.0 after adding the grace marks. Suppose students have their Roll numbers ranging from 0 to 19 & CGPA between 0-10. Add the grace CGPA to the obtained CGPA of student by 0.1 to 0.5 points. The CGPA should be assigned through random function. Input Format The input should contain an array of CGPA of the students. Constraints CGPA must lies between 1.0 to 10.0 otherwise prints "invalid input" Output Format For each test case, display the roll number and increased CGPA of those students only who lies between the obtained CGPA of 4.5-4.9.with help of python
arrow_forward
Correct answer will be upvoted else Multiple Downvoted. Computer science.
You are given an integer n (n>1).
Your assignment is to find a succession of integers a1,a2,… ,ak with the end goal that:
every simulated intelligence is completely more prominent than 1;
a1⋅a2⋅… ⋅ak=n (I. e. the result of this grouping is n);
ai+1 is separable by simulated intelligence for every I from 1 to k−1;
k is the most extreme conceivable (I. e. the length of this grouping is the greatest conceivable).
In case there are a few such groupings, any of them is adequate. It tends to be demonstrated that somewhere around one substantial grouping consistently exists for any integer n>1.
You need to answer t autonomous experiments.
Input
The primary line of the input contains one integer t (1≤t≤5000) — the number of experiments. Then, at that point, t experiments follow.
The main line of the experiment contains one integer n (2≤n≤1010).
It is ensured that the amount of n…
arrow_forward
Linear regression aims to learn the parameters 7 from the training set D = {(f(),y(i)), i {(x(i),y(i)),i = 1,2,...,m} so that the hypothesis ho(x) = ēr i can predict the output y given an input vector š. Please derive the least mean squares and stochastic gradient descent update rule, that is to use gradient descent algorithm to update Ô so as to minimize the least squares cost function JO).
arrow_forward
5.1.3
complete answer and solution onlt no need explanation
It is suspected from theoretical considerations that the rate of water flow from a firehouse is proportional to some power of the nozzle pressure. Assume pressure data is more accurate. You are transforming the data.
F
96
129
135
145
168
235
p
11
17
20
25
40
55
What is the exponent of the nozzle pressure in the regression model F = apb?
arrow_forward
Tuition($)
Applicant Pool
Applicant
950
76210
11040
1225
78000
10940
1325
67420
8670
1350
70380
9040
1500
62580
7410
1675
59260
7080
1800
57930
6350
1975
60130
6110
a.develop the multiple regression equation for these data.
b. What is the coefficient of determination for this regression equation?
c. Determine the forecast for freshman applicants for a tuition rate of $1700 per semester, with a pool of applicants of 63000.
CAN YOU SHOW ME ALL THE ANSWER STEP STEP WİTH EXCELL
arrow_forward
In R, write a function that produces plots of statistical power versus sample size for simple linear regression. The function should be of the form LinRegPower(N,B,A,sd,nrep), where N is a vector/list of sample sizes, B is the true slope, A is the true intercept, sd is the true standard deviation of the residuals, and nrep is the number of simulation replicates. The function should conduct simulations and then produce a plot of statistical power versus the sample sizes in N for the hypothesis test of whether the slope is different than zero. B and A can be vectors/lists of equal length. In this case, the plot should have separate lines for each pair of A and B values (A[1] with B[1], A[2] with B[2], etc). The function should produce an informative error message if A and B are not the same length. It should also give an informative error message if N only has a single value. Demonstrate your function with some sample plots. Find some cases where power varies from close to zero to near…
arrow_forward
Generate 100 synthetic data points (x,y) as follows: x is uniform over [0,1]10 and y = P10 i=1 i ∗ xi + 0.1 ∗ N(0,1) where N(0,1) is the standard normal distribution. Implement full gradient descent and stochastic gradient descent, and test them on linear regression over the synthetic data points.
Subject: Python Programming
arrow_forward
J 1
Continuous Uniform distibution
Suppose we are working with the Continuous uniform random variable taking values on (0,1).
Define a function “cont_uni_samp” that takes input “n” and returns a random sample of size “n” from this
distribution.
Use the “cont_uni_samp” function and the replicate function to to get the histograms for the sampling
distribution of the sample mean when working with sample sizes n = 1,2,3,4,15,500. Be sure to have
appropriate titles for your histograms.
What do you notice?
arrow_forward
below is the xample file
# ================= Polynomial Regression ===================
# Thus far, we have assumed that the relationship between the explanatory
# variables and the response variable is linear. This assumption is not always
# true. This is where polynomial regression comes in. Polynomial regression
# is a special case of multiple linear regression that adds terms with degrees
# greater than one to the model. The real-world curvilinear relationship is captured
# when you transform the training data by adding polynomial terms, which are then fit in
# the same manner as in multiple linear regression.
# We are now going to us only one explanatory variable, but the model now has
# three terms instead of two. The explanatory variable has been transformed
# and added as a third term to the model to captre the curvilinear relationship.
# The PolynomialFeatures transformer can be used to easily add polynomial features
# to a feature representation. Let's fit a model to these…
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education
Related Questions
Single Point based Search:
Fair share problem: Given a set of N positive integers S={x1, x2, x3,…, xk,… xN}, decide whether S can be partitioned into two sets S0 and S1 such that the sum of numbers in S0 equals to the sum of numbers in S1. This problem can be formulated as a minimisation problem using the objective function which takes the absolute value of the difference between the sum of elements in S0 and the sum of elements in S1. Assuming that such a partition is possible, then the minimum for a given problem instance would have an objective value of 0. A candidate solution can be represented using a binary array r=[b1, b2, b3,…, bk,… bN], where bk is a binary variable indicating which set the k-th number in S is partitioned into, that is, if bk =0, then the k-th number is partitioned in to S0, otherwise (which means bk =1) the k-th number is partitioned in to S1. For example, given the set with five integers S={4, 1, 2, 2, 1}, the solution [0,1,0,1,1] indicates that S is…
arrow_forward
Do the follow by using jupyter notebook.
Implement a function to solve the multilinear regression problem for a given vector y of dependent values and a matrix X of independent values. Your function should return the least-squares solution for the parameter vector, βˆ. Hint: Be sure to add a column of all 1’s to your X matrix for the intercept term. Hint 2: See the SVD example code for matrix operations using numpy. In addition to those, you will need to perform a matrix inverse, which you can do with numpy.linalg.solve
arrow_forward
We have a corpus and the total number of documents within is 1, The following words occur in the following number of documents:
”machine” occurs in 32 documents
”learning” occurs in 16 documents
”software” occurs in 8 documents
”computer” occurs in 64 documents
”robust” occurs in 1,024 documents
Please calculate the TF-IDF weighted term vector for the following document D. Assume that the log in the IDF weight is taken to the base 2. (Hint: all the numbers above are powers of 2).
”machine learning software robust computer software”
arrow_forward
We have a corpus and the total number of documents within is 1, The following words occur in the following number of documents:
”machine” occurs in 32 documents
”learning” occurs in 16 documents
”software” occurs in 8 documents
”computer” occurs in 64 documents
”robust” occurs in 1,024 documents
Please calculate the TF-IDF weighted term vector for the following document D. Assume that the log in the IDF weight is taken to the base 2. (Hint: all the numbers above are powers of 2).
Document D: "machine learning software robust computer software"
arrow_forward
Question 2) We have N jobs and N workers to do these jobs. It is known at what cost each worker will do each job (as a positive numerical value). We want to assign jobs to workers in such a way that the total cost of completion of all jobs is minimal among other possible alternative assignments. For this problem, write the algorithm as pseudocode, whose input is a matrix representing worker/job costs, and the output is a list of tuples showing which work will be done by which worker, and that tries to reach the solution with GREEDY technique. Explain in what sense your algorithm exhibits greedy behavior. What is the time complexity of your algorithm? Interpret if your algorithm always produces the best (optimum) result for each instance of the problem.
arrow_forward
Answer the following:
This problem exercises the basic concepts of game playing, using tic-tac-toe (noughts and crosses) as an example. We define Xn as the number of rows, columns, or diagonals with exactly n X’s and no O’s. Similarly, On is the number of rows, columns, or diagonals with just n O’s. The utility function assigns +1 to any position with X3=1 and −1 to any position with O3=1. All other terminal positions have utility 0. For nonterminal positions, we use a linear evaluation function defined as Eval(s)=3X2(s)+X1(s)−(3O2(s)+O1(s)).
a. Show the whole game tree starting from an empty board down to depth 2 (i.e., one X and one O on the board), taking symmetry into account.
b. Mark on your tree the evaluations of all the positions at depth 2.
c .Using the minimax algorithm, mark on your tree the backed-up values for the positions at depths 1 and 0, and use those values to choose the best starting move.
Provide original solutions including original diagram for part a!
arrow_forward
Answer the following:
This problem exercises the basic concepts of game playing, using tic-tac-toe (noughts and crosses) as an example. We define Xn as the number of rows, columns, or diagonals with exactly n X’s and no O’s. Similarly, On is the number of rows, columns, or diagonals with just n O’s. The utility function assigns +1 to any position with X3=1 and −1 to any position with O3=1. All other terminal positions have utility 0. For nonterminal positions, we use a linear evaluation function defined as Eval(s)=3X2(s)+X1(s)−(3O2(s)+O1(s)).
a. Show the whole game tree starting from an empty board down to depth 2 (i.e., one X and one O on the board), taking symmetry into account.
b. Mark on your tree the evaluations of all the positions at depth 2.
c .Using the minimax algorithm, mark on your tree the backed-up values for the positions at depths 1 and 0, and use those values to choose the best starting move.
Provide original solution!
arrow_forward
Step 1. Intersection over Union
# def intersection_over_union(dt_bbox, gt_bbox): ---> return iou
Step 2. Evaluate Sample
We now have to evaluate the predictions of the model. To do this, we will write a function that will do the following:
Take model predictions and ground truth bounding boxes and labels as inputs.
For each bounding box from the prediction, find the closest bounding box among the answers.
For each found pair of bounding boxes, check whether the IoU is greater than a certain threshold iou_threshold. If the IoU exceeds the threshold, then we consider this answer as True Positive.
Remove a matched bounding box from the evaluation.
For each predicted bounding box, return the detection score and whether we were able to match it or not.
def evaluate_sample(target_pred, target_true, iou_threshold=0.5):
# ground truth
gt_bboxes = target_true['boxes'].numpy()
gt_labels = target_true['labels'].numpy()
# predictions
dt_bboxes =…
arrow_forward
Weighted Interval Scheduling & Dynamic Programming (Knapsack, Edit Distance)
Suppose you are in the middle of a pandemic.
Given a list of daily case counts to analyze, one would like to identify periods of high growth in the cases.
One way to do is to look at the change in new cases from day to day. For example, suppose we have the following data: (picture)
We would like to identify the period of maximal growth. In the case above, such a period would be from Days 3 through 6, which has net growth of 47 cases.
Give an algorithm in pseudocode that, when given a list of daily "changes" in case rates, identifies the period of maximal growth. Give proofs of correctness and running time for your algorithm.
arrow_forward
(Code in R language)
Consider the data presented in the Trades.csv file (table given below). This file represents 35 days worth of data from a brokerage house that is trying to predict the number of trade executions per day as a function of the number of incoming phone calls to the
Set up a scatterplot of the
Determine the fitted regression equation for this data, and use it to predict the number of trade executions that will occur if there are 2300 incoming calls to the
If the firm receives 100 more calls on Tuesday than they did on Monday, how many more executions should they expect?
Suppose the CFO asks you to predict the number of executions when there are 3500 incoming calls. What should you say?
(Trades.csv file)
Day
Calls
Executions
1
2591
417
2
2146
321
3
2185
362
4
2245
364
5
2600
442
6
2510
386
7
2394
370
8
2486
376
9
2483
463
10
2297
389
11
2106
302
12
2035
266
13
1936
339
14
1951
369
15
2292
403
16
2094
319
17
1897
306
18
2237
397…
arrow_forward
Predicting Housing Median Prices. – The file BostonHousing.csv contains information on 506 census tracts in Boston, where for each tract multiple variables are recorded. The last column (CAT.MEDV) was derived from MEDV, such that it obtains the value 1 if MEDV > 30 and 0 otherwise. First, consider the goal of predicting the median value (MEDV) of a tract, given the information in the first 12 columns. Second, consider the goal of classifying the property using the last column of CAT.MEDV.
Partition the data into training (60%) and validation (40%) sets.
a1. Perform a knn prediction with all 12 predictors (columns 1 – 12) with MEDV (column 13) as the outcome variable. (Ignore the CAT.MEDV column in this step.) Try values of k from 1 to 10. Make sure to normalize the data (preprocess), and choose function knn() from the class package/library rather than FNN. [To make sure R is using class package (when both packages are loaded), use class::knn().] What is the best k? What does it…
arrow_forward
Ma1.
1) On a Bank Reconciliation, if our check was written for $492.83 and was processed as such by the bank, but had been shown in our company's accounting records as a check for $498.23, we would code this as a C+ item.T
rue or False
2) In the Bottom-Up method of calculating required revenue, we treat the amount of desired net income (once we have calculated how much it should be) as:a.
a variable cost.
b. a step cost.
c. unnecessary for the calculation.
d. a fixed item.
e. none of the above.
3) A large F variance from budget in a revenue item should be investigated.
True or False
4) If a Bank Reconciliation cannot be made to balance, then something unusual has occurred and must be investigated.
True or False
5) In preparing a bank reconciliation, we will code an NSF check (using the fabulous Bessner system) as:
a. a C+ item.
b. a C- item.
c. a B+ item.
d. a B- item.
e. none of the above.
6) If a company wants to end up with an AFTER-TAX profit of $25,000, and its tax rate is 38%,…
arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education