▲Foundations of Computer Vision (2024)visionbook.mit.edu

209 points by tzury 1 days ago | 16 comments

pantulis 16 hours ago [-]

There is a very interesting section in the book, "On Research, Writing and Speaking", which includes gems like:

“This sounds like hard work.” Yes. It’s no longer about being smart. By now, everyone around you is smart. In graduate school, it’s the hard workers who pull ahead.

bonoboTP 15 hours ago [-]

That's definitely insightful. Everyone reaches a level where coasting on smarts is no longer sufficient.

Many reach this realization when starting university, but some can still coast okay in college since the material to learn is well defined and upper bounded. A PhD is not really upper bounded. There's no set out amount of papers to read per week like in a college course. There's no "this won't be part of the exam". Anything is fair game. The returns on being smarter never flatten out, but simply there's no ceiling. You can always do more, read more to keep up with the literature firehose, improve your experiments, your method, etc.

You also need soft skills and a network. You need to keep your finger on the pulse of the community by going to conferences and getting to know people, grabbing coffee or going out to dinner with them. You also need to be slef driven instead of waiting for instructions like it was in college. You need to be just the right amount of skeptical and critical regarding existing methods to be able to come up with new things while being also understood and accepted and seen relevant and exciting by the community.

You also need to manage your time and set your own deadlines and maintain a routine without the external sync given by university lectures and exams. All this basically has no upper limit and even the expectations are vaguely defined. You face rejections maybe for the first time despite having done a thorough work because the reviewers don't see enough novelty or it doesn't slot neatly into what is in fashion at the moment.

My point is that a PhD can push everyone to meet their mental limits. It can be frustrating and it's a notoriously hard period of time for many PhD students. Of course if your only goal is to graduate to get the doctorate, there are possible strategies to "coast", but those who go for the academic path often expect to achieve more than the bare minimum, especially if they managed to coast with good results in college.

oytis 2 hours ago [-]

Can someone working in the field comment on how relevant the content still is? A lot of ML including CV seems (from the outside at least) to be completely disrupted by the developments of the last two years.

bonoboTP 2 hours ago [-]

Very relevant. None of the recent techniques are truly revolutionary. It's all based on these same foundations. I'd say it would do good to read even older ones. There are lots of real, profitable computer vision applications built on classic methods like Hough transforms, canny edges, sift, Harris corners, etc. You should be familiar with these if you want to come across as a serious professional as opposed to a hype boy vibe coder who can just rattle off buzzwords and glue apis without fundamental understanding.

walterlw 1 hours ago [-]

there are still a lot of problems to be solved using "classical" computer vision, especially in systems where you don't have easy access to GPU acceleration. I am a practitioner doing Simultaneous localization and mapping on compute-restricted platforms, so definitely going to read the Structure from Motion chapter.

AdieuToLogic 12 hours ago [-]

Another great book in this field is:

  Computer Vision, Fifth Edition
  E.R. Davies
  Academic Press
  ISBN-13  978-0128092842

bonoboTP 2 hours ago [-]

The other main one is Szeliski's Computer Vision 2nd Ed from 2022 https://szeliski.org/Book/

Forsyth & Ponce is also good but somewhat old by now. And for 3d, the classic is still Hartley & Zisserman's Multiple View Geometry.

hananova 4 hours ago [-]

The "Writing this book" section accidentally implies that LLM's were used for 2/3rds of the manuscript.

I think they probably mean that LLM's just gave them a lot more to write about, but I think it would be a good idea to clarify.

oytis 3 hours ago [-]

I am not reading it like this - in fact ChatGPT was the first thing out there that would be able to assist them in writing, and less than a third of this book was written after release of ChatGPT. To me it just looks like marking important events in ML/AI field on the graph.

la_fayette 15 hours ago [-]

Unbelievable that this book is freely available! Thanks to the authors, publishers or whoever.

walterlw 1 hours ago [-]

Very true and joining in on the thanks. Did you find a way to download it as a pdf though? I believe it is essential to be able to add notes and references when reading any learning material.

bonoboTP 15 hours ago [-]

The machine learning, computer vision and robotics communities are really great at publishing their books online for free access. You can get the absolute top textbooks of these fields for free online. Quite a contrast to other fields where profs kinda require you to buy the latest edition for hundreds of dollars in the US. Not to mention that this gives access to the best resources everyone around the world in poorer countries as well. Many also share their course materials and videos online.

vincenthwt 11 hours ago [-]

Can anyone recommend a good book on Machine Vision? I believe the foundation of effective machine vision, and even computer vision, lies in selecting the right camera, optics, and lighting. High-quality images are essential because poor input leads to poor output.

ack_inc 6 hours ago [-]

Hi, could you mention a use-case or two where these things made a real difference?

bonoboTP 16 minutes ago [-]

The term "machine vision" is mainly used in highly controlled, narrow industrial applications, think factory assembly lines, steel inspection, monitoring for cracks in materials, shape or size classification of items, etc. The task is usually very well defined, and the same thing needs to be repeated under essentially the same conditions over and over again with high reliability.

But many other things exist outside the "glue some GPT4o vision api stuff together for a mobile app to pitch to VCs" space. Like inspecting and servicing airplanes (Airbus has vision engineers who make tools for internal use, you don't have datasets of a billion images for that). There are also things like 3D motion capture of animals, such as mice or even insects like flies, which requires very precise calibration and proper optical setups. Or estimating the meat yield of pigs and cows on farms from multi-view images combined with weight measurements. There are medical things, like cell counting, 3D reconstruction of facial geometry for plastic surgery, dentistry applications, and a million other things other than chatting with ChatGPT about images or classifying cats vs dogs or drawing bounding boxes of people in a smartphone video.

jeffreygoesto 5 hours ago [-]

Any serious production inspection.

16 hours ago [-]

Loading comments...

pantulis 16 hours ago [-]

There is a very interesting section in the book, "On Research, Writing and Speaking", which includes gems like:

“This sounds like hard work.” Yes. It’s no longer about being smart. By now, everyone around you is smart. In graduate school, it’s the hard workers who pull ahead.

bonoboTP 15 hours ago [-]

That's definitely insightful. Everyone reaches a level where coasting on smarts is no longer sufficient.

oytis 2 hours ago [-]

bonoboTP 2 hours ago [-]

walterlw 1 hours ago [-]

AdieuToLogic 12 hours ago [-]

Another great book in this field is:

  Computer Vision, Fifth Edition
  E.R. Davies
  Academic Press
  ISBN-13  978-0128092842

bonoboTP 2 hours ago [-]

The other main one is Szeliski's Computer Vision 2nd Ed from 2022 https://szeliski.org/Book/

Forsyth & Ponce is also good but somewhat old by now. And for 3d, the classic is still Hartley & Zisserman's Multiple View Geometry.

hananova 4 hours ago [-]

The "Writing this book" section accidentally implies that LLM's were used for 2/3rds of the manuscript.

I think they probably mean that LLM's just gave them a lot more to write about, but I think it would be a good idea to clarify.

oytis 3 hours ago [-]

la_fayette 15 hours ago [-]

Unbelievable that this book is freely available! Thanks to the authors, publishers or whoever.

walterlw 1 hours ago [-]

Very true and joining in on the thanks. Did you find a way to download it as a pdf though? I believe it is essential to be able to add notes and references when reading any learning material.

bonoboTP 15 hours ago [-]

vincenthwt 11 hours ago [-]

ack_inc 6 hours ago [-]

Hi, could you mention a use-case or two where these things made a real difference?

bonoboTP 16 minutes ago [-]

jeffreygoesto 5 hours ago [-]

Any serious production inspection.

16 hours ago [-]