Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
pydrill DocumentationRelease 0.3.4
Wojciech Nowak
Apr 24, 2018
Contents
1 pydrill 31.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Sample usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Supported api calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Installation 7
3 Usage 93.1 Supported api calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Contributing 134.1 Types of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2 Get Started! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.3 Pull Request Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.4 Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Credits 175.1 Development Lead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.2 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6 History 196.1 0.3.4 (2017-04-24) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.2 0.3.3 (2017-04-24) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.3 0.3.2 (2017-04-18) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.4 0.3.1 (2017-03-06) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.5 0.3 (2017-02-15) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.6 0.1.1 (2016-05-21) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.7 0.1.0 (2016-05-19) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.8 0.0.2 (2016-04-24) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.9 0.0.1 (2015-12-28) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7 Indices and tables 21
i
ii
pydrill Documentation, Release 0.3.4
Contents:
Contents 1
pydrill Documentation, Release 0.3.4
2 Contents
CHAPTER 1
pydrill
Python Driver for Apache Drill.
Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage
• Free software: MIT license
• Documentation: https://pydrill.readthedocs.org.
1.1 Features
• Python 2/3 compatibility,
• Support for all rest API calls inluding profiles/options/metrics docs with full list.
• Mapping Results to internal python types,
• Compatibility with Pandas data frame,
• Drill Authentication using PAM,
1.2 Installation
Version from https://pypi.python.org/pypi/pydrill:
$ pip install pydrill
Latest version from git:
$ pip install git+git://github.com/PythonicNinja/pydrill.git
3
pydrill Documentation, Release 0.3.4
1.3 Sample usage
from pydrill.client import PyDrill
drill = PyDrill(host='localhost', port=8047)
if not drill.is_active():raise ImproperlyConfigured('Please run Drill first')
yelp_reviews = drill.query('''SELECT * FROM`dfs.root`.`./Users/macbookair/Downloads/yelp_dataset_challenge_academic_dataset/
→˓yelp_academic_dataset_review.json`LIMIT 5
''')
for result in yelp_reviews:print("%s: %s" %(result['type'], result['date']))
# pandas dataframe
df = yelp_reviews.to_dataframe()print(df[df['stars'] > 3])
1.4 Supported api calls
class pydrill.client.PyDrill(host=’localhost’, port=8047, trasport_class=<class ’py-drill.transport.Transport’>, connection_class=<class ’py-drill.connection.requests_conn.RequestsHttpConnection’>,auth=None, **kwargs)
>>> drill = PyDrill(host='localhost', port=8047)>>> drill.is_active()True
is_active(timeout=2)
Parameters timeout – int
Returns boolean
metrics(timeout=10)Get the current memory metrics.
Parameters timeout – int
Returns pydrill.client.Result
options(timeout=10)List the name, default, and data type of the system and session options.
Parameters timeout – int
Returns pydrill.client.Result
perform_request(method, url, params=None, body=None)
4 Chapter 1. pydrill
pydrill Documentation, Release 0.3.4
plan(sql, timeout=10)
Parameters
• sql – string
• timeout – int
Returns pydrill.client.ResultQuery
profile(query_id, timeout=10)Get the profile of the query that has the given queryid.
Parameters
• query_id – The UUID of the query in standard UUID format that Drill assigns to eachquery.
• timeout – int
Returns pydrill.client.Result
profile_cancel(query_id, timeout=10)Cancel the query that has the given queryid.
Parameters
• query_id – The UUID of the query in standard UUID format that Drill assigns to eachquery.
• timeout – int
Returns pydrill.client.Result
profiles(timeout=10)Get the profiles of running and completed queries.
Parameters timeout – int
Returns pydrill.client.Result
query(sql, timeout=10)Submit a query and return results.
Parameters
• sql – string
• timeout – int
Returns pydrill.client.ResultQuery
stats(timeout=10)Get Drillbit information, such as ports numbers.
Parameters timeout – int
Returns pydrill.client.Stats
storage(timeout=10)Get the list of storage plugin names and configurations.
Parameters timeout – int
Returns pydrill.client.Result
storage_delete(name, timeout=10)Delete a storage plugin configuration.
1.4. Supported api calls 5
pydrill Documentation, Release 0.3.4
Parameters
• name – The name of the storage plugin configuration to delete.
• timeout – int
Returns pydrill.client.Result
storage_detail(name, timeout=10)Get the definition of the named storage plugin.
Parameters
• name – The assigned name in the storage plugin definition.
• timeout – int
Returns pydrill.client.Result
storage_enable(name, value=True, timeout=10)Enable or disable the named storage plugin.
Parameters
• name – The assigned name in the storage plugin definition.
• value – Either True (to enable) or False (to disable).
• timeout – int
Returns pydrill.client.Result
storage_update(name, config, timeout=10)Create or update a storage plugin configuration.
Parameters
• name – The name of the storage plugin configuration to create or update.
• config – Overwrites the existing configuration if there is any, and therefore, must includeall
required attributes and definitions. :param timeout: int :return: pydrill.client.Result
threads(timeout=10)Get the status of threads.
Parameters timeout – int
Returns pydrill.client.Result
6 Chapter 1. pydrill
CHAPTER 2
Installation
At the command line:
$ easy_install pydrill
Or, if you have virtualenvwrapper installed:
$ mkvirtualenv pydrill$ pip install pydrill
Latest version from git:
$ pip install git+git://github.com/PythonicNinja/pydrill.git
7
pydrill Documentation, Release 0.3.4
8 Chapter 2. Installation
CHAPTER 3
Usage
To use pydrill in a project:
from pydrill.client import PyDrill
drill = PyDrill(host='localhost', port=8047)
You can also initialize via environment variables such as:
PYDRILL_HOSTPYDRILL_PORT
You can use Drill PAM authentication via auth param:
drill = PyDrill(auth='drill_user:drill_password')
To enable specific storage plugin you can:
drill.storage_enable('mongo')
You can view all queries which were executed or are running:
drill.profiles()
To check if Drill is running:
if drill.is_active():your_code
Query involves only providing sql:
employees = drill.query('''SELECT * FROM cp.`employee.json` LIMIT 5
''')
9
pydrill Documentation, Release 0.3.4
for employee in employees:print result
If you feel like building sql queries is not nicest thing ever you should try pydrill_dsl https://pypi.python.org/pypi/pydrill_dsl
Support for pandas:
# pandas dataframedf = employees.to_dataframe()print(df[df['salary'] > 20000])
3.1 Supported api calls
class pydrill.client.PyDrill(host=’localhost’, port=8047, trasport_class=<class ’py-drill.transport.Transport’>, connection_class=<class ’py-drill.connection.requests_conn.RequestsHttpConnection’>,auth=None, **kwargs)
>>> drill = PyDrill(host='localhost', port=8047)>>> drill.is_active()True
is_active(timeout=2)
Parameters timeout – int
Returns boolean
metrics(timeout=10)Get the current memory metrics.
Parameters timeout – int
Returns pydrill.client.Result
options(timeout=10)List the name, default, and data type of the system and session options.
Parameters timeout – int
Returns pydrill.client.Result
perform_request(method, url, params=None, body=None)
plan(sql, timeout=10)
Parameters
• sql – string
• timeout – int
Returns pydrill.client.ResultQuery
profile(query_id, timeout=10)Get the profile of the query that has the given queryid.
Parameters
10 Chapter 3. Usage
pydrill Documentation, Release 0.3.4
• query_id – The UUID of the query in standard UUID format that Drill assigns to eachquery.
• timeout – int
Returns pydrill.client.Result
profile_cancel(query_id, timeout=10)Cancel the query that has the given queryid.
Parameters
• query_id – The UUID of the query in standard UUID format that Drill assigns to eachquery.
• timeout – int
Returns pydrill.client.Result
profiles(timeout=10)Get the profiles of running and completed queries.
Parameters timeout – int
Returns pydrill.client.Result
query(sql, timeout=10)Submit a query and return results.
Parameters
• sql – string
• timeout – int
Returns pydrill.client.ResultQuery
stats(timeout=10)Get Drillbit information, such as ports numbers.
Parameters timeout – int
Returns pydrill.client.Stats
storage(timeout=10)Get the list of storage plugin names and configurations.
Parameters timeout – int
Returns pydrill.client.Result
storage_delete(name, timeout=10)Delete a storage plugin configuration.
Parameters
• name – The name of the storage plugin configuration to delete.
• timeout – int
Returns pydrill.client.Result
storage_detail(name, timeout=10)Get the definition of the named storage plugin.
Parameters
• name – The assigned name in the storage plugin definition.
3.1. Supported api calls 11
pydrill Documentation, Release 0.3.4
• timeout – int
Returns pydrill.client.Result
storage_enable(name, value=True, timeout=10)Enable or disable the named storage plugin.
Parameters
• name – The assigned name in the storage plugin definition.
• value – Either True (to enable) or False (to disable).
• timeout – int
Returns pydrill.client.Result
storage_update(name, config, timeout=10)Create or update a storage plugin configuration.
Parameters
• name – The name of the storage plugin configuration to create or update.
• config – Overwrites the existing configuration if there is any, and therefore, must includeall
required attributes and definitions. :param timeout: int :return: pydrill.client.Result
threads(timeout=10)Get the status of threads.
Parameters timeout – int
Returns pydrill.client.Result
12 Chapter 3. Usage
CHAPTER 4
Contributing
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways:
4.1 Types of Contributions
4.1.1 Report Bugs
Report bugs at https://github.com/PythonicNinja/pydrill/issues.
If you are reporting a bug, please include:
• Your operating system name and version.
• Any details about your local setup that might be helpful in troubleshooting.
• Detailed steps to reproduce the bug.
4.1.2 Fix Bugs
Look through the GitHub issues for bugs. Anything tagged with “bug” is open to whoever wants to implement it.
4.1.3 Implement Features
Look through the GitHub issues for features. Anything tagged with “feature” is open to whoever wants to implementit.
13
pydrill Documentation, Release 0.3.4
4.1.4 Write Documentation
pydrill could always use more documentation, whether as part of the official pydrill docs, in docstrings, or even on theweb in blog posts, articles, and such.
4.1.5 Submit Feedback
The best way to send feedback is to file an issue at https://github.com/PythonicNinja/pydrill/issues.
If you are proposing a feature:
• Explain in detail how it would work.
• Keep the scope as narrow as possible, to make it easier to implement.
• Remember that this is a volunteer-driven project, and that contributions are welcome :)
4.2 Get Started!
Ready to contribute? Here’s how to set up pydrill for local development.
1. Fork the pydrill repo on GitHub.
2. Clone your fork locally:
$ git clone [email protected]:your_name_here/pydrill.git
3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set upyour fork for local development:
$ mkvirtualenv pydrill$ cd pydrill/$ python setup.py develop
4. Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
5. When you’re done making changes, check that your changes pass flake8 and the tests, including testing otherPython versions with tox:
$ source ./run_docker (it will export PYDRILL_HOST and PYDRILL_PORT)or$ export PYDRILL_HOST='127.0.0.1'$ export PYDRILL_PORT='8047'$ tox -e run-isort # it will update imports$ tox -e check-isort # it will check if imports are correct$ tox -e check-flake8 # it will check quality by flake8$ tox -e py27 # run tests with py27$ tox # run all
To get flake8 and tox, just pip install them into your virtualenv.
6. Commit your changes and push your branch to GitHub:
14 Chapter 4. Contributing
pydrill Documentation, Release 0.3.4
$ git add .$ git commit -m "Your detailed description of your changes."$ git push origin name-of-your-bugfix-or-feature
7. Submit a pull request through the GitHub website.
4.3 Pull Request Guidelines
Before you submit a pull request, check that it meets these guidelines:
1. The pull request should include tests.
2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a functionwith a docstring, and add the feature to the list in README.rst.
3. The pull request should work for Python 2.6, 2.7, 3.3, 3.4 and 3.5, and for PyPy. Check https://travis-ci.org/PythonicNinja/pydrill/pull_requests and make sure that the tests pass for all supported Python versions.
4.4 Tips
To run a subset of tests:
$ tox -e py27 -- -k threads # it will only run tests which have key threads in name
4.3. Pull Request Guidelines 15
pydrill Documentation, Release 0.3.4
16 Chapter 4. Contributing
CHAPTER 5
Credits
5.1 Development Lead
• Wojciech Nowak <[email protected]>
5.2 Contributors
None yet. Why not be the first?
17
pydrill Documentation, Release 0.3.4
18 Chapter 5. Credits
CHAPTER 6
History
6.1 0.3.4 (2017-04-24)
• Updated pypi listing long_description
6.2 0.3.3 (2017-04-24)
• Fix pypi installation
6.3 0.3.2 (2017-04-18)
• Support for dtype on to_dataframe
6.4 0.3.1 (2017-03-06)
• Support for Drill Authentication using PAM
6.5 0.3 (2017-02-15)
• requests response encoding (utf-8)
• support Python 3.6 support
19
pydrill Documentation, Release 0.3.4
6.6 0.1.1 (2016-05-21)
• Anaconda requirements fixed
6.7 0.1.0 (2016-05-19)
• First minor release
• Updated docs
6.8 0.0.2 (2016-04-24)
• First release on PyPI.
• Implementation of metrics/storage/options/stats
• Builds are tested by docker container with Apache Drill running
• support for pandas with ResultQuery.to_dataframe
6.9 0.0.1 (2015-12-28)
• Project start
20 Chapter 6. History
CHAPTER 7
Indices and tables
• genindex
• modindex
• search
21
pydrill Documentation, Release 0.3.4
22 Chapter 7. Indices and tables
Index
Iis_active() (pydrill.client.PyDrill method), 4, 10
Mmetrics() (pydrill.client.PyDrill method), 4, 10
Ooptions() (pydrill.client.PyDrill method), 4, 10
Pperform_request() (pydrill.client.PyDrill method), 4, 10plan() (pydrill.client.PyDrill method), 4, 10profile() (pydrill.client.PyDrill method), 5, 10profile_cancel() (pydrill.client.PyDrill method), 5, 11profiles() (pydrill.client.PyDrill method), 5, 11PyDrill (class in pydrill.client), 4, 10
Qquery() (pydrill.client.PyDrill method), 5, 11
Sstats() (pydrill.client.PyDrill method), 5, 11storage() (pydrill.client.PyDrill method), 5, 11storage_delete() (pydrill.client.PyDrill method), 5, 11storage_detail() (pydrill.client.PyDrill method), 6, 11storage_enable() (pydrill.client.PyDrill method), 6, 12storage_update() (pydrill.client.PyDrill method), 6, 12
Tthreads() (pydrill.client.PyDrill method), 6, 12
23