Natural Language Processing (NLP)and CountVectorizer by ...

Flask program giving module not found

Im building a python web application with flask and uWSGI following this lovely guide and it worked marvels, the basic hello does indeed appear in the website hen loaded. I have installed every single module and dependency in the project file. Im trying now to build on the working script and I now my file looks like this:
from flask import Flask import pylab as pl import numpy as np import pandas as pd
from sklearn import svm from sklearn import tree import matplotlib.pyplot as plt from sklearn import linear_model from sklearn.pipeline import Pipeline from sklearn.metrics import confusion_matrix from sklearn.naive_bayes import MultinomialNB from sklearn.linear_model import SGDClassifier from mlxtend.plotting import plot_decision_regions from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import TfidfTransformer
app = Flask(name)
def hello(): data = pd.read_csv('test1.csv', error_bad_lines=False, delimiter=',') numpy_array = data.as_matrix() #print numpy_array
 #text in column 1, classifier in column 2. X = numpy_array[:,0] Y = numpy_array[:,1] Y=Y.astype(np.str) #divide the test set and set the variable to their correct label/text X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.4, random_state=42) #MultinomialNB text_clf = Pipeline([('vect', CountVectorizer(stop_words='english')), ('tfidf', TfidfTransformer()),('clf', MultinomialNB()),]) text_clf ='U'),Y_train.astype('U')) predicted = text_clf.predict(X_test) # print the actual accuracy print "MNB accuracy: ", np.mean(predicted == Y_test) #make the confusion matrix y_actu = pd.Series(Y_test, name='Actual') y_pred = pd.Series(predicted, name='Predicted') df_confusion = pd.crosstab(y_actu, y_pred) print df_confusion print"-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------$ #SVM vect = CountVectorizer(min_df=0., max_df=1.0) X = vect.fit_transform(X_train.astype('U')) min_frequency = 22 text_clf_svm = Pipeline([('vect', CountVectorizer(min_df=min_frequency, stop_words='english')), ('tfidf', TfidfTransformer()),('clf-svm', SGDClassifier(loss='hinge', penalty='l2', alpha=1e-03, n_iter=1000, random_state=21))]) text_clf_svm ='U'),Y_train.astype('U')) predicted_svm = text_clf_svm.predict(X_test) # print the actual accuracy print "svm accuracy: ", np.mean(predicted_svm == Y_test) #make the confusion matrix y_actu = pd.Series(Y_test, name='Actual') y_pred = pd.Series(predicted_svm, name='Predicted') df_confusion = pd.crosstab(y_actu, y_pred) print df_confusion 
if name == "main": '''''
All is good with this as far as im concerned made sure to install all dependencies and module sin the folder from which im running the code. but when I run it I get the following error
''''' [[email protected] fyp]# semodule -i mynginx.pp [[email protected] fyp]# env/bin/uwsgi --socket -w WSGI:app & [1] 1710 [[email protected] fyp]# *** Starting uWSGI 2.0.15 (64bit) on [Wed Feb 7 01:16:21 2018] *** compiled with version: 4.8.5 20150623 (Red Hat 4.8.5-16) on 06 February 2018 20:03:13 os: Linux-3.10.0-693.17.1.el7.x8664 #1 SMP Thu Jan 25 20:13:58 UTC 2018 nodename: python-political-bias-app machine: x86_64 clock source: unix pcre jit disabled detected number of CPU cores: 1 current working directory: /root/fyp detected binary path: /root/fyp/env/bin/uwsgi uWSGI running as root, you can use --uid/--gid/--chroot options *** WARNING: you are running uWSGI as root !!! (use the --uid flag) *** *** WARNING: you are running uWSGI without its master process manager *** your processes number limit is 3807 your memory page size is 4096 bytes detected max file descriptor number: 1024 lock engine: pthread robust mutexes thunder lock: disabled (you can enable it with --thunder-lock) uwsgi socket 0 bound to TCP address fd 3 Python version: 2.7.5 (default, Aug 4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] *** Python threads support is disabled. You can enable it with --enable-threads *** Python main interpreter initialized at 0x74bba0 your server socket listen backlog is limited to 100 connections your mercy for graceful operations on workers is 60 seconds mapped 72768 bytes (71 KB) for 1 cores *** Operational MODE: single process *** Traceback (most recent call last): File "./", line 1, in from app import app File "./app/", line 2, in import pylab as pl ImportError: No module named pylab unable to load app 0 (mountpoint='') (callable not found or import error) *** no app loaded. going in full dynamic mode *** *** uWSGI is running in multiple interpreter mode *** '''''
Im pretty lost as to why, any pointer would really help out, the code itself runs perfectly well inside locally, so Im not sure whats going on .
submitted by Lucasxhy to flask [link] [comments]

You can pass a callable as the analyzer argument to get full control over the tokenization, e.g. >>> from pprint import pprint >>> import re >>> x = ['this is a foo bar', 'you are a foo bar black sheep'] >>> def words_and_char_bigrams(text): ... binary : boolean, default=False. If True, all non zero counts are set to 1. This is useful for discrete probabilistic models that model binary events rather than integer counts. dtype : type, optional. Type of the matrix returned by fit_transform() or transform(). Attributes: vocabulary_ : dict. A mapping of terms to feature indices. stop_words_ : set. Terms that were ignored because they ... Sunday, October 16, 2016. Countvectorizer Binary Options CountVectorizer (*, input='content', encoding='utf-8', decode_error='strict', ... binary bool, default=False. If True, all non zero counts are set to 1. This is useful for discrete probabilistic models that model binary events rather than integer counts. dtype type, default=np.int64. Type of the matrix returned by fit_transform() or transform(). Attributes vocabulary_ dict. A mapping of terms ... The following are 30 code examples for showing how to use sklearn.feature_extraction.text.CountVectorizer(). These examples are extracted from open source projects. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar. You may also ... Natural Language Processing (NLP) is how we make machines learn human language of communication. In this article, my purpose is to show you how sklearn library CountVectorizer can be used in ... In the current version (0.14.1), there's a bug where TfidfVectorizer(binary=True, ...) silently leaves binary=False, which can throw you off during a grid search for the best parameters. (CountVectorizer, in contrast, sets the binary flag correctly.) This appears to be fixed in future (post-0.14.1) versions. CountVectorizer finds words in your text using the token_pattern regex. By default this only matches a word if it is at least 2 characters long, and will only generate counts for those words. In your case, the words are only ‘0’ and ‘1’ which are both just 1 character, so they get excluded from the vocabulary, meaning that fit_transform fails. ... "For me the love should start with attraction.i should feel that I need her every time around me.she should be the first thing which comes in my thoughts.I would start the day and end it with her.she should be there every time I will be then when my every breath has her life should happen around life will be named to her.I would cry for her.will give all my happiness ... Text data requires special preparation before you can start using it for predictive modeling. The text must be parsed to remove words, called tokenization. Then the words need to be encoded as integers or floating point values for use as input to a machine learning algorithm, called feature extraction (or vectorization). The scikit-learn library offers easy-to-use tools to perform both ...

[index] [17310] [6981] [6165] [11900] [20391] [1854] [29379] [6740] [1652] [20796]