I'm trying to build an algorithm that it is able to detect pages that require to apply AWS Textract because they are scanned content. The use case is that some documents have text plain content but other parts are scanned and they are all spread around the document. So I need to find a way in Python to know when to apply Textract to extract the text from the scanned parts and the rest just use some normal library to extract the plain text. In Python.