Suggestion for creation of OpenCV technical notes #17217

catree · 2020-05-05T00:34:55Z

In my opinion, there should be some kind of OpenCV technical notes for certain important algorithms. The idea is to summarize in a document different things like implementation details, implementation choices, deviation from the original algorithm, performance or accuracy benchmarks.

Cons:

can almost double the development time in order to develop proper benchmark, proper method to analyze the results
can become obsolete quickly
more work to do, time consuming, tedious

Pros:

implementation details can be useful for some users
better for the research community since it should avoid confusion between OpenCV implementation and original implementation, can be citable maybe?

Here some examples I have in mind:

SIFT features

20 years later, it is the "revival" of SIFT features, it would be great to be able to summarize the performance and the accuracy of the OpenCV SIFT implementation with the newest future developments
references could be the Lowe's SIFT binary and the vl_sift implementation
from this recent paper (Image Matching across Wide Baselines: From Paper to Practice), it looks like the OpenCV SIFT implementation performs correctly
probably the dataset and the methodology for a proper accuracy benchmark will be time consuming

SURF features

possible alternative to SIFT is SURF
there is an old benchmark page (2012) comparing the OpenCV SURF implementations with other libraries
it performs badly, this is an old benchmark but a quick look to the history shows not so much new
changes
this means that benchmarks using the OpenCV SURF implementation could potentially overperform (since the OpenCV implementation has worse performance in term of accuracy compared to the SURF original implementation)
lots of efforts are needed to check the implementation, improve it, so this is probably not soon this would happen
there are also CUDA and OpenCL SURF implementations in OpenCV, ideally the three implementations should give the same results, so more work needed

Harris corners detection

in theory the Harris corners method should be rotation invariant
an user stumbled about this issue where the OpenCV implementation of the Harris corners is not rotation invariance
here the reported issue

the thing is that the OpenCV implementation deviates from the original method since it uses for instance box blur instead of Gaussian blur for performance reason, here a paper with some info about the OpenCV implementation: An Analysis and Implementation of the Harris CornerDetector
by tweaking the parameters in goodFeaturesToTrack() instead of retrieving manually the corners from the output of cornerHarris(), better results can be achieved. Here the link, left is result from DIPlib, right is OpenCV

for this kind of issue, it would have been great to have the implementation details of the OpenCV / IPP Harris corners detector summarizes somewhere

AprilTag

I like the AprilTag fiducial marker detector
there is a GSoC subject tackling this topic
ideally, the OpenCV AprilTag implementation should give the exact same results than the original implementation
here the official repo for AprilTag 3 version described in this paper: Flexible Layouts for Fiducial Tags
in my opinion, if the OpenCV implementation deviates and gives poorer results than the original code, warning should be put in the documentation to warn the user that the results are inferior to the original code
and ideally global performance and accuracy results should be documented somewhere
I would also advice to avoid mixing ArUco and AprilTag methods in the code:
- there are already some differences between the OpenCV ArUco and the author ArUco latest development (Aruco 3)
- but I think most of the time the ArUco is mentioned, this is for the OpenCV implementation
- to avoid confusion, it would be better in my opinion to have something like a parent class for fiducial markers and implementation classes for ArUco and AprilTag methods
from my experience, AprilTag 3 is better for detecting tags and gives more accurate tag corners locations than OpenCV ArUco
for instance a quick test:
this is AprilTag 3
this is OpenCV ArUco with DICT_6x6 and refine=None:
this is OpenCV ArUco with DICT_6x6 and refine=Subpixel:
this is OpenCV ArUco with DICT_6x6 and refine=contour:
this is OpenCV ArUco with DICT_6x6 and refine=AprilTag 2:
this is a quick test, I did not try to tweak the ArUco parameters
detection rate of AprilTag 3 should be a little bit better than OpenCV ArUco but accuracy in corners extraction should be definitively better with AprilTag 3
no idea why changing the ArUco refine method gives different detection results, I am using example_aruco_detect_markers sample

Pixel coordinates system

for image resizing, warping and maybe some other operations, OpenCV treats coordinates using top-left pixel coordinates
there are some info in the doc, but probably more details would be better in the doc since other libraries can use a different convention
also using top-left coordinates should introduce shift, so for image analysis this is not desirable
related issues: 9096, 10146, and maybe also 12680
with Deep Learning, I think I have read that different image resizing method can give notable difference?

For important new algorithms, new developments, these kind of implementation details or performance/accuracy benchmarks should be made available from the OpenCV doc. This can be simply in a Doxygen form or maybe even in pdf form for easy citation?

asmorkalov · 2020-05-08T09:44:29Z

@vpisarev could you look at it?

catree · 2020-05-10T18:32:02Z

Afterward, technical note wording (in the sense something citable) is probably too "strong". I see the G-API and maybe the future SIFT implementation improvement that could fit.

What I would like to emphasis with the different examples is the need to a strong focus on OpenCV documentation and tutorials.

Harris example:

most likely the code to extract the Harris corners from the response map comes from this tutorial for the original issue
the issue is that, in my opinion the goodFeaturesToTrack() function or GFTTDetector detector should be used instead of cornerHarris() function
cornerHarris() can be used in the tutorial to explain the theory but for practical implementation goodFeaturesToTrack() function should be used to avoid post-processing the response map manually
another issue is that, what is returned by cornerHarris()? See the following code:

blockSize=1
apertureSize = 3
k = 0.04

img = np.zeros((8,8), dtype=np.uint8)
cv.rectangle(img, (2,2), (5,5), 255, thickness=-1)

dst = cv.cornerHarris(img, blockSize, apertureSize, k)
dst_flt = cv.cornerHarris(img.astype(np.float32), blockSize, apertureSize, k)

print('img:\n', img)
print('dst:\n', dst)
print('dst_flt:\n', dst_flt)

it returns:

img:
 [[  0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0]
 [  0   0 255 255 255 255   0   0]
 [  0   0 255 255 255 255   0   0]
 [  0   0 255 255 255 255   0   0]
 [  0   0 255 255 255 255   0   0]
 [  0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0]]
dst:
 [[ 0.        0.        0.        0.        0.        0.        0.
   0.      ]
 [ 0.       -0.000625 -0.015625 -0.04     -0.04     -0.015625 -0.000625
   0.      ]
 [ 0.       -0.015625 -0.050625 -0.04     -0.04     -0.050625 -0.015625
   0.      ]
 [ 0.       -0.04     -0.04      0.        0.       -0.04     -0.04
   0.      ]
 [ 0.       -0.04     -0.04      0.        0.       -0.04     -0.04
   0.      ]
 [ 0.       -0.015625 -0.050625 -0.04     -0.04     -0.050625 -0.015625
   0.      ]
 [ 0.       -0.000625 -0.015625 -0.04     -0.04     -0.015625 -0.000625
   0.      ]
 [ 0.        0.        0.        0.        0.        0.        0.
   0.      ]]
dst_flt:
 [[ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [ 0.0000000e+00 -2.6426565e+06 -6.6066416e+07 -1.6913002e+08
  -1.6913002e+08 -6.6066416e+07 -2.6426565e+06  0.0000000e+00]
 [ 0.0000000e+00 -6.6066416e+07 -2.1405517e+08 -1.6913002e+08
  -1.6913002e+08 -2.1405517e+08 -6.6066416e+07  0.0000000e+00]
 [ 0.0000000e+00 -1.6913002e+08 -1.6913002e+08  0.0000000e+00
   0.0000000e+00 -1.6913002e+08 -1.6913002e+08  0.0000000e+00]
 [ 0.0000000e+00 -1.6913002e+08 -1.6913002e+08  0.0000000e+00
   0.0000000e+00 -1.6913002e+08 -1.6913002e+08  0.0000000e+00]
 [ 0.0000000e+00 -6.6066416e+07 -2.1405517e+08 -1.6913002e+08
  -1.6913002e+08 -2.1405517e+08 -6.6066416e+07  0.0000000e+00]
 [ 0.0000000e+00 -2.6426565e+06 -6.6066416e+07 -1.6913002e+08
  -1.6913002e+08 -6.6066416e+07 -2.6426565e+06  0.0000000e+00]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]]

most likely for certain input parameters IPP is used but this is not mentioned in the doc (one can expect IPP is used but I think in this case only for certain parameters size)

AprilTag3

AprilTag2 dictionaries can already be decoded, see opencv/opencv_contrib#1637
AprilTag2 corners extraction accuracy can be obtained thanks to this PR: opencv/opencv_contrib#1570
having AprilTag3 in OpenCV would be great but only if retaining the original performance and accuracy in my opinion
else an explicit mention in the doc of the current OpenCV implementation performance and a link to the original code should be done, to avoid confusion between the OpenCV implementation and the original code
since OpenCV is much more "bigger", I would find unfortunate to have users having bad performance with the OpenCV AprilTag code, while the original code just works fine
another example of disappointing performance: the OpenCV QR code implementation

To summarize an again too long post, better documentation is needed.
For sure, human resources and founding are lacking. Improving the OpenCV tutorials is not something interesting for a GSoC student. It is also a dedicated job. Hopefully, in the future the following improvements could be made:

refresh, update, improve the starter tutorials for newcomers:
- see for instance: Why is so difficult to install Open CV in Windows?
- this is not representative since there are always more complaints but the introduction OpenCV tutorials are definitely outdated
update and improve the other tutorials
- e.g. on skimage: gallery
main documentation needs improvements, for instance:
- which function is/can be accelerated with IPP? with OpenCL?
- input/output accepted types for function parameters
- document when inplace parameter is possible or not
- sometimes implementation details can be useful

Having more implementation details, performance, accuracy results would be great, but definitively the priority is in the documentation instead.

catree · 2020-05-10T18:33:31Z

To finish with an intentionally provocative post, this is a comment about code quality and software design in OpenCV.

In my opinion, there are some observations that deserve a look in the linked comment. For instance:

better API design for the user:
- it should be better now that only source compatibility is required but still
- better design to avoid having too much overloaded functions with different parameters
- better documentation to know which input is accepted, see also #4449
- issue with consistency in function design, see also #10631
about the "generic RANSAC kernel design":
- yes, ideally a generic RANSAC should be used, to be able to reuse it for Homography, PnP, etc.
- this should be already the case in some part I think, but since there is a GSoC focusing on RANSAC, it would be great to have something generic, that can be easily tuned or adapted for the different estimation methods (Homography, Essential matrix, ...) if it is possible
- in general it is the lack of genericity that seems problematic
about the "kitchen sink":
- there are new features added or will be added but in the same time there are already some issues in the existing code that should be fixed
- disappointing performance of certain features

This is a "rant post".

For sure what should be taken into account is the human and financial resources attributed to the OpenCV project, but focus on the API design, code quality are still relevant. Also, due to the long OpenCV history, some changes cannot be made without breaking user code.

Participation of the OpenCV community is probably disappointing, compared to the size of the OpenCV users. There are still some nice contributions, like the CUDA DNN implementation.

catree · 2020-06-04T20:02:02Z

@lydiakravchenko

Apology for answering here.

I don't think my posts are suitable for the https://opencv.org/ homepage. There are mostly critics, that I hope are constructive, and some improvement suggestions.

Rather, some ideas that I think would be more suitable for the https://opencv.org/ homepage:

recent improvements / features about DNN and CUDA capability:
- performance numbers for the CUDA DNN backend?
- newly supported DNN networks like Efficient-Net or YOLOv4?
- see the corresponding PR and ask him if he wants to advertise his works on the homepage?
results / reports when GSoC 2020 will be finished:
- for instance, a post like this? Open Robotics welcomes our GSoC 2020 students!
- when the GSoC 2020 will be finished, publish the reports on the homepage? Google Summer Of Code Project Improvements to Motion Planning Support
- I think reports for the previous GSoC are buried in the different PR. It would be great to have summary pages explaining the contributions on the homepage, written by the GSoC students (like the previous MoveIt link).
links to OpenCV talks if any?
advertisement of some interesting contrib features (more advertisement of stable features from the OpenCV contrib module would be desirable)?

In general, I think motivations for writing on the OpenCV homepage would be:

communication from the inside of the OpenCV team (new features, new releases, general news, ...)
for external authors, a mean to advertise their works, or to publish new contributions, features (e.g. new stereomatching method, new feature matching methods, etc.)

Finally, if the community is big enough something like the ROS Discourse? But the community must be big enough in order to be useful to have an OpenCV Discourse.

asmorkalov added category: documentation pr: Discussion Required labels May 5, 2020

asmorkalov assigned lydiakravchenko May 15, 2020

opencv / opencv

Suggestion for creation of OpenCV technical notes #17217

Suggestion for creation of OpenCV technical notes #17217

catree commented May 5, 2020

asmorkalov commented May 8, 2020

catree commented May 10, 2020

catree commented May 10, 2020

catree commented Jun 4, 2020

opencv / opencv

Join GitHub today

Suggestion for creation of OpenCV technical notes #17217

Suggestion for creation of OpenCV technical notes #17217

Comments

catree commented May 5, 2020

SIFT features

SURF features

Harris corners detection

AprilTag

Pixel coordinates system

asmorkalov commented May 8, 2020

catree commented May 10, 2020

Harris example:

AprilTag3

catree commented May 10, 2020

catree commented Jun 4, 2020