Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion for creation of OpenCV technical notes #17217

Open
catree opened this issue May 5, 2020 · 4 comments
Open

Suggestion for creation of OpenCV technical notes #17217

catree opened this issue May 5, 2020 · 4 comments

Comments

@catree
Copy link
Contributor

@catree catree commented May 5, 2020

In my opinion, there should be some kind of OpenCV technical notes for certain important algorithms. The idea is to summarize in a document different things like implementation details, implementation choices, deviation from the original algorithm, performance or accuracy benchmarks.

Cons:

  • can almost double the development time in order to develop proper benchmark, proper method to analyze the results
  • can become obsolete quickly
  • more work to do, time consuming, tedious

Pros:

  • implementation details can be useful for some users
  • better for the research community since it should avoid confusion between OpenCV implementation and original implementation, can be citable maybe?

Here some examples I have in mind:

SIFT features

  • 20 years later, it is the "revival" of SIFT features, it would be great to be able to summarize the performance and the accuracy of the OpenCV SIFT implementation with the newest future developments
  • references could be the Lowe's SIFT binary and the vl_sift implementation
  • from this recent paper (Image Matching across Wide Baselines: From Paper to Practice), it looks like the OpenCV SIFT implementation performs correctly
    image
  • probably the dataset and the methodology for a proper accuracy benchmark will be time consuming

SURF features

  • possible alternative to SIFT is SURF
  • there is an old benchmark page (2012) comparing the OpenCV SURF implementations with other libraries
  • it performs badly, this is an old benchmark but a quick look to the history shows not so much new
    changes
  • this means that benchmarks using the OpenCV SURF implementation could potentially overperform (since the OpenCV implementation has worse performance in term of accuracy compared to the SURF original implementation)
  • lots of efforts are needed to check the implementation, improve it, so this is probably not soon this would happen
  • there are also CUDA and OpenCL SURF implementations in OpenCV, ideally the three implementations should give the same results, so more work needed

Harris corners detection

  • in theory the Harris corners method should be rotation invariant
  • an user stumbled about this issue where the OpenCV implementation of the Harris corners is not rotation invariance
  • here the reported issue

  • the thing is that the OpenCV implementation deviates from the original method since it uses for instance box blur instead of Gaussian blur for performance reason, here a paper with some info about the OpenCV implementation: An Analysis and Implementation of the Harris CornerDetector
  • by tweaking the parameters in goodFeaturesToTrack() instead of retrieving manually the corners from the output of cornerHarris(), better results can be achieved. Here the link, left is result from DIPlib, right is OpenCV

  • for this kind of issue, it would have been great to have the implementation details of the OpenCV / IPP Harris corners detector summarizes somewhere

AprilTag

  • I like the AprilTag fiducial marker detector

  • there is a GSoC subject tackling this topic

  • ideally, the OpenCV AprilTag implementation should give the exact same results than the original implementation

  • here the official repo for AprilTag 3 version described in this paper: Flexible Layouts for Fiducial Tags

  • in my opinion, if the OpenCV implementation deviates and gives poorer results than the original code, warning should be put in the documentation to warn the user that the results are inferior to the original code

  • and ideally global performance and accuracy results should be documented somewhere

  • I would also advice to avoid mixing ArUco and AprilTag methods in the code:

    • there are already some differences between the OpenCV ArUco and the author ArUco latest development (Aruco 3)
    • but I think most of the time the ArUco is mentioned, this is for the OpenCV implementation
    • to avoid confusion, it would be better in my opinion to have something like a parent class for fiducial markers and implementation classes for ArUco and AprilTag methods
  • from my experience, AprilTag 3 is better for detecting tags and gives more accurate tag corners locations than OpenCV ArUco

  • for instance a quick test:

  • this is AprilTag 3
    AprilTag_detections

  • this is OpenCV ArUco with DICT_6x6 and refine=None:
    ArUco_detections_refine_0

  • this is OpenCV ArUco with DICT_6x6 and refine=Subpixel:
    ArUco_detections_refine_1

  • this is OpenCV ArUco with DICT_6x6 and refine=contour:
    ArUco_detections_refine_2

  • this is OpenCV ArUco with DICT_6x6 and refine=AprilTag 2:
    ArUco_detections_refine_3

  • this is a quick test, I did not try to tweak the ArUco parameters

  • detection rate of AprilTag 3 should be a little bit better than OpenCV ArUco but accuracy in corners extraction should be definitively better with AprilTag 3

  • no idea why changing the ArUco refine method gives different detection results, I am using example_aruco_detect_markers sample

Pixel coordinates system

  • for image resizing, warping and maybe some other operations, OpenCV treats coordinates using top-left pixel coordinates
  • there are some info in the doc, but probably more details would be better in the doc since other libraries can use a different convention
  • also using top-left coordinates should introduce shift, so for image analysis this is not desirable
  • related issues: 9096, 10146, and maybe also 12680
  • with Deep Learning, I think I have read that different image resizing method can give notable difference?

For important new algorithms, new developments, these kind of implementation details or performance/accuracy benchmarks should be made available from the OpenCV doc. This can be simply in a Doxygen form or maybe even in pdf form for easy citation?

@asmorkalov
Copy link
Contributor

@asmorkalov asmorkalov commented May 8, 2020

@vpisarev could you look at it?

@catree
Copy link
Contributor Author

@catree catree commented May 10, 2020

Afterward, technical note wording (in the sense something citable) is probably too "strong". I see the G-API and maybe the future SIFT implementation improvement that could fit.

What I would like to emphasis with the different examples is the need to a strong focus on OpenCV documentation and tutorials.


Harris example:

  • most likely the code to extract the Harris corners from the response map comes from this tutorial for the original issue
  • the issue is that, in my opinion the goodFeaturesToTrack() function or GFTTDetector detector should be used instead of cornerHarris() function
  • cornerHarris() can be used in the tutorial to explain the theory but for practical implementation goodFeaturesToTrack() function should be used to avoid post-processing the response map manually
  • another issue is that, what is returned by cornerHarris()? See the following code:
blockSize=1
apertureSize = 3
k = 0.04

img = np.zeros((8,8), dtype=np.uint8)
cv.rectangle(img, (2,2), (5,5), 255, thickness=-1)

dst = cv.cornerHarris(img, blockSize, apertureSize, k)
dst_flt = cv.cornerHarris(img.astype(np.float32), blockSize, apertureSize, k)

print('img:\n', img)
print('dst:\n', dst)
print('dst_flt:\n', dst_flt)
  • it returns:
img:
 [[  0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0]
 [  0   0 255 255 255 255   0   0]
 [  0   0 255 255 255 255   0   0]
 [  0   0 255 255 255 255   0   0]
 [  0   0 255 255 255 255   0   0]
 [  0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0]]
dst:
 [[ 0.        0.        0.        0.        0.        0.        0.
   0.      ]
 [ 0.       -0.000625 -0.015625 -0.04     -0.04     -0.015625 -0.000625
   0.      ]
 [ 0.       -0.015625 -0.050625 -0.04     -0.04     -0.050625 -0.015625
   0.      ]
 [ 0.       -0.04     -0.04      0.        0.       -0.04     -0.04
   0.      ]
 [ 0.       -0.04     -0.04      0.        0.       -0.04     -0.04
   0.      ]
 [ 0.       -0.015625 -0.050625 -0.04     -0.04     -0.050625 -0.015625
   0.      ]
 [ 0.       -0.000625 -0.015625 -0.04     -0.04     -0.015625 -0.000625
   0.      ]
 [ 0.        0.        0.        0.        0.        0.        0.
   0.      ]]
dst_flt:
 [[ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [ 0.0000000e+00 -2.6426565e+06 -6.6066416e+07 -1.6913002e+08
  -1.6913002e+08 -6.6066416e+07 -2.6426565e+06  0.0000000e+00]
 [ 0.0000000e+00 -6.6066416e+07 -2.1405517e+08 -1.6913002e+08
  -1.6913002e+08 -2.1405517e+08 -6.6066416e+07  0.0000000e+00]
 [ 0.0000000e+00 -1.6913002e+08 -1.6913002e+08  0.0000000e+00
   0.0000000e+00 -1.6913002e+08 -1.6913002e+08  0.0000000e+00]
 [ 0.0000000e+00 -1.6913002e+08 -1.6913002e+08  0.0000000e+00
   0.0000000e+00 -1.6913002e+08 -1.6913002e+08  0.0000000e+00]
 [ 0.0000000e+00 -6.6066416e+07 -2.1405517e+08 -1.6913002e+08
  -1.6913002e+08 -2.1405517e+08 -6.6066416e+07  0.0000000e+00]
 [ 0.0000000e+00 -2.6426565e+06 -6.6066416e+07 -1.6913002e+08
  -1.6913002e+08 -6.6066416e+07 -2.6426565e+06  0.0000000e+00]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]]
  • most likely for certain input parameters IPP is used but this is not mentioned in the doc (one can expect IPP is used but I think in this case only for certain parameters size)

AprilTag3

  • AprilTag2 dictionaries can already be decoded, see opencv/opencv_contrib#1637
  • AprilTag2 corners extraction accuracy can be obtained thanks to this PR: opencv/opencv_contrib#1570
  • having AprilTag3 in OpenCV would be great but only if retaining the original performance and accuracy in my opinion
  • else an explicit mention in the doc of the current OpenCV implementation performance and a link to the original code should be done, to avoid confusion between the OpenCV implementation and the original code
  • since OpenCV is much more "bigger", I would find unfortunate to have users having bad performance with the OpenCV AprilTag code, while the original code just works fine
  • another example of disappointing performance: the OpenCV QR code implementation

To summarize an again too long post, better documentation is needed.
For sure, human resources and founding are lacking. Improving the OpenCV tutorials is not something interesting for a GSoC student. It is also a dedicated job. Hopefully, in the future the following improvements could be made:

  • refresh, update, improve the starter tutorials for newcomers:
  • update and improve the other tutorials
  • main documentation needs improvements, for instance:
    • which function is/can be accelerated with IPP? with OpenCL?
    • input/output accepted types for function parameters
    • document when inplace parameter is possible or not
    • sometimes implementation details can be useful

Having more implementation details, performance, accuracy results would be great, but definitively the priority is in the documentation instead.

@catree
Copy link
Contributor Author

@catree catree commented May 10, 2020

To finish with an intentionally provocative post, this is a comment about code quality and software design in OpenCV.

In my opinion, there are some observations that deserve a look in the linked comment. For instance:

  • better API design for the user:
    • it should be better now that only source compatibility is required but still
    • better design to avoid having too much overloaded functions with different parameters
    • better documentation to know which input is accepted, see also #4449
    • issue with consistency in function design, see also #10631
  • about the "generic RANSAC kernel design":
    • yes, ideally a generic RANSAC should be used, to be able to reuse it for Homography, PnP, etc.
    • this should be already the case in some part I think, but since there is a GSoC focusing on RANSAC, it would be great to have something generic, that can be easily tuned or adapted for the different estimation methods (Homography, Essential matrix, ...) if it is possible
    • in general it is the lack of genericity that seems problematic
  • about the "kitchen sink":
    • there are new features added or will be added but in the same time there are already some issues in the existing code that should be fixed
    • disappointing performance of certain features

This is a "rant post".

For sure what should be taken into account is the human and financial resources attributed to the OpenCV project, but focus on the API design, code quality are still relevant. Also, due to the long OpenCV history, some changes cannot be made without breaking user code.

Participation of the OpenCV community is probably disappointing, compared to the size of the OpenCV users. There are still some nice contributions, like the CUDA DNN implementation.

@catree
Copy link
Contributor Author

@catree catree commented Jun 4, 2020

@lydiakravchenko

Apology for answering here.


I don't think my posts are suitable for the https://opencv.org/ homepage. There are mostly critics, that I hope are constructive, and some improvement suggestions.

Rather, some ideas that I think would be more suitable for the https://opencv.org/ homepage:

  • recent improvements / features about DNN and CUDA capability:
    • performance numbers for the CUDA DNN backend?
    • newly supported DNN networks like Efficient-Net or YOLOv4?
    • see the corresponding PR and ask him if he wants to advertise his works on the homepage?
  • results / reports when GSoC 2020 will be finished:
  • links to OpenCV talks if any?
  • advertisement of some interesting contrib features (more advertisement of stable features from the OpenCV contrib module would be desirable)?

In general, I think motivations for writing on the OpenCV homepage would be:

  • communication from the inside of the OpenCV team (new features, new releases, general news, ...)
  • for external authors, a mean to advertise their works, or to publish new contributions, features (e.g. new stereomatching method, new feature matching methods, etc.)

Finally, if the community is big enough something like the ROS Discourse? But the community must be big enough in order to be useful to have an OpenCV Discourse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.